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Preface 


Quantum information theory studies the general laws of transfer, storage and process¬ 
ing information in systems obeying the laws of quantum mechanics. It took shape as 
a self-consistent area of research in the 1990s, while its origin can be traced back to 
the 1950-1960s, which was when the basic ideas of reliable data transmission and of 
Shannon’s information theory were developed. At the first stage, which covers the pe¬ 
riod 1960-1980, the main issue consisted of the fundamental restrictions on the pos¬ 
sibilities of information transfer and processing caused by the quantum-mechanical 
nature of its carrier. Modem technological developments, relying upon the achieve¬ 
ments of quantum electronics and quantum optics, suggest that in the foreseeable 
future such restrictions will become the main obstacle limiting further extrapolation 
of existing technologies and principles of information processing. 

The emergence, in the 1980-1990s, of the ideas of the quantum computing, quan¬ 
tum cryptography and the new communication protocols, on the other hand, allowed 
discussing not only the restrictions, but also the new possibilities created by the use 
of specific quantum resources, such as quantum entanglement, quantum complemen¬ 
tarity, and quantum parallelism. Quantum information theory provides the clue to un¬ 
derstanding these fundamental issues, and stimulates the development of experimental 
physics, with potential importance to new, effective applications. At present, inves¬ 
tigations in the area of quantum information science, including information theory, 
its experimental aspects and technological developments, are ongoing in advanced 
research centers throughout the world. 

The mathematical toolbox of “classical” information theory contains methods based 
in probability theory, combinatorics, modem algebra, including algebraic geometry, 
etc. For a mathematician sensible to the impact of his research on the natural sciences, 
information theory can be a source of deep ideas and new, challenging problems, with 
sound motivation and applications. This equally, if not to a greater extent, applies to 
quantum information theory, the scope of which turns out to be closely connected to 
multilinear algebra and non-commutative analysis, convexity and asymptotic theory 
of finite-dimensional normed spaces, subtle aspects of positivity and tensor products 
in operator algebras, and with the methods of random matrices. Nowadays, the inti¬ 
mate connections to operator spaces and so called “quantum functional analysis” have 
been revealed and explored. 

In 2002, the Moscow Independent University published the author’s lecture notes 
(in Russian), in which an attempt was made at a mathematician’s introduction to prob¬ 
lems of quantum information theory. In 2010, a substantially expanded text was pub- 
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lished with the title “Quantum systems, channels, information”. The author’s inten¬ 
tion was to provide a widely accessible and self-contained introduction to the subject, 
starting from primary structures and leading up to nontrivial results with rather de¬ 
tailed proofs, as well as to some open problems. The present English text is a further 
step in that direction, extending and improving on the Russian version of 2010. 

The exposition is organized in concentric circles, the N-th round consisting of Parts 
I to N, where each circle is self-contained. The reader can restrict himself to any of 
these circles, depending on the depth of presentation that he demands. In particular, 
in Part I to Part IV, we consider finite-dimensional systems and channels, whereas the 
infinite dimensional case is treated in the final Part V. 

Part I starts with a description of the statistical structure of quantum theory. After 
introducing the necessary mathematical prerequisites in Chapter 1, the central focus 
in Chapters 2, 3 is on the discussing the key features of the quantum complementarity 
and entanglement. The former is reflected by the noncommutativity of the algebra 
of observables of the system, while the latter is reflected by the tensor product struc¬ 
ture of composite quantum systems. Chapter 3 also contains first applications of the 
information-theoretic approach to quantum systems. 

In information theory, the notions of a channel and its capacity, giving a measure 
of ultimate information-processing performance of the channel, play a central role. In 
Chapter 4 of Part II, a review of the basic concepts and necessary results from classical 
information theory is provided, the quantum analogs of which are the main subject of 
the following chapters. The concepts of random coding and typicality are introduced, 
and then extended to the quantum case in Chapter 5. That chapter contains direct and 
self-consistent proofs of the quantum information bound and of the primary coding 
theorems for the classical-quantum channels, which will later serve as a basis for the 
more advanced capacity results in Chapter 8. 

Part III is devoted to the study of quantum channels and their entropy characteris¬ 
tics. In Chapter 6, we discuss the general concept and structure of a quantum channel, 
with the help of a variety of examples. From the point of view of operator algebras, 
these are normalized, completely positive maps, the analog of Markov maps in non- 
commutative probability theory, and play the role of morphisms in the category of 
quantum systems. From the point of view of statistical mechanics, a channel gives 
an overall description of the evolution of an open quantum system interacting with an 
environment - a physical counterpart of the mathematical dilation theorem. Various 
entropic quantities essential to characterize the information-processing performance, 
as well as the irreversibility of the channel, are investigated in Chapter 7. 

Part IV is devoted to the proofs of advanced coding theorems, which give the main 
capacities of a quantum channel. Remarkably, in the quantum case, the notion of the 
channel capacity splits, giving a whole spectrum of information-processing charac¬ 
teristics, depending on the kind of data transmitted (classical or quantum), as well as 
on the additional communication resources. In Chapter 8, we discuss the classical 
capacity of a quantum channel, i.e. the capacity for transmitting classical data. We 
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touch upon the tremendous progress made recently in the solution of the related addi¬ 
tivity problem and point out the remaining questions. Chapter 9 is devoted to the clas¬ 
sical entanglement-assisted capacity and its comparison with unassisted capacity. In 
Chapter 10, we consider reliable transmission of quantum information (i.e. quantum 
states), which turns out to be closely related to the private transmission of classical 
information. The corresponding coding theorems provide the quantum capacity and 
the private classical capacity of a quantum channel. 

In Part V, we pass from finite-dimensional to separable Hilbert space. Chapter 11 
deals with the new obstacles characteristic for infinite-dimensional channels - sin¬ 
gular behavior of the entropy (infinite values, discontinuity), and the emergence of 
the input channel constraints (e.g. finiteness of the signal energy) and of the continu¬ 
ous optimizing state ensembles. Chapter 12 treats the bosonic Gaussian systems and 
channels on the canonical commutation relations (many experimental demonstrations 
of quantum information processing were realized in such “continuous variables” sys¬ 
tems, based in particular on the principles of quantum optics). We assume the reader 
has some minor background in the field and start with a rather extended introduction 
at the beginning of Chapter 12. Next, we describe and study in detail the Gaussian 
states and channels. The main mathematical problems here are the structure of the 
multi-mode quantum Gaussian channels and the computation of the various entropic 
quantities characterizing their performance. While the classical entanglement-assisted 
capacity is, in principle, computable for a general Gaussian channel, the quantum ca¬ 
pacity is found only for restricted classes of channels, and the unassisted classical 
capacity in general presents an open analytical problem, namely that of verifying the 
conjecture of “quantum Gaussian optimizers”, which is comparable in complexity to 
the additivity problem (also open for the class of Gaussian channels), and appears to 
be closely related to it. 

This book does not intend to be an all-embracing text in quantum information the¬ 
ory and its content definitely reflects the author’s personal research interests and pref¬ 
erences. For example, the important topics of entanglement quantification and error 
correction are mentioned only briefly. An interested reader can find an account of 
these in other sources, listed in the notes and references to the individual chapters. 
Quantum information theory is in a stage of fast development and new, important re¬ 
sults continue to appear. Yet, we hope the present text will be a useful addition to the 
existing literature, particularly for mathematically inclined readers eager to penetrate 
the fascinating world of quantum information. 

The basis for these lecture notes was a course taught by the author at the Moscow 
Institute of Physics and Technology, Moscow State University, and several Western 
institutions. The author acknowledges stimulating discussions, collaborations and 
invaluable support of R. Ahlswede, A. Barchielli, C. H. Bennett, G. M. D’Ariano, 
C. Fuchs, V. Giovannetti, O. Hirota, R. Jozsa, L. Lanz, O. Melsheimer, H. Neumann, 
M. B. Ruskai, P. W. Shor, Yu. M. Suhov, K. A. Valiev, R. Werner, A. Winter, M. Wolf. 



I extend special thanks to my colleagues Maxim Shirokov and Andrey Bulinsky for 
their careful reading of the manuscript and the suggestion of numerous improvements. 

This work was supported by Russian Foundation for Basic Research, Fundamental 
Research Programs of the Russian Academy of Sciences, and by the Cariplo Fellow¬ 
ship organized by the Landau Network - Centro Volta. 
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Basic structures 




Chapter 1 

Vectors and operators 


We will deal with quantum-mechanical systems described by finite-dimensional Hil¬ 
bert spaces. On the one hand, one can see already in that case, and perhaps most 
manifestly, the distinctive features of quantum statistics. On the other hand, finite- 
level systems are of the main interest to applications such as quantum communica¬ 
tion, cryptography, and computing (although in the field of quantum information the¬ 
ory attention was recently attracted by the “continuous-variables systems,” that are 
described by infinite-dimensional spaces, see Chapters 11, 12). 

1.1 Hilbert space 

Let Jt be a (/-dimensional complex linear space, dim M = d < oo, with the inner 
product (</>| xji),<p, xj/ e M, satisfying the axioms of unitary space (see e.g. [138]). 
However, following physical rather than mathematical tradition, we assume (</>|i/r) to 
be linear with respect to the second argument i \r and antilinear with respect to </>. Again 
following a tradition in physics literature, we shall call it a Hilbert space (although em¬ 
phasis of the mathematical Hilbert space theory is on the infinite-dimensional case). 

We shall use Dirac’s notations: a vector x/r e M (which conventionally should be 
thought of as a column vector) will often be denoted as |i jr). Correspondingly, (<p\ 
will denote the linear function on M, defined by the inner product 

(01 : 0 -» ( 010 )- feM. 

These linear functions themselves form another Hilbert space - the dual space Jt * of 
Jt (and they should be thought of as row vectors (4>\ = |0)*, the Hermitian conju¬ 
gate (adjoint) to |</>)). The space Jt* is anti-isomorphic to Jt via the correspondence 
10) ** (01- Then the inner product (</>|0) is naturally thought of as the product of 
the “bra” and “ket” vectors (<p\ and |i j/). The square of the norm, both in Jt and 31*, 

is ( 010 ) = ||vi 2 . 

This notation also allows for a convenient description of operators. For example, 
the “outer product” of “ket” and “bra” vectors A = 10)(0| describes the rank one 
operator, which acts onto vector |y) according to the formula A\x) = |0)(0|/). 
Let {e,} 1 = be an orthonormal basis in 3f_. The decomposition of arbitrary vector 
xjr e X can then be written as 


d 

10) = I e «')( e «'l0). 


( 1 . 1 ) 



4 


I Basic structures 


which is the same as 

d 

5>i><eil = /, (1-2) 

1 = 1 

where I is the unit operator in X. 

I Exercise 1.1. Write the matrix representation for operators in X, similar to the 
representation (1.1) for vectors. 

An additional advantage of Dirac’s notations is that we do not need to write the 
symbol of the vector, leaving only the label(s), e.g. we may write \i) instead of |e,), 
etc. 

Sometimes we shall consider a real Hilbert (i.e. Euclidean) space. A fundamental 
difference with the complex case is revealed by the polarization identity 

{ 3 

^ + i k f,<p + i k f), (1-3) 

k =0 

which allows us to uniquely restore all values of the form /3(<p, \p), which is linear 
with respect to the second argument and antilinear with respect to the first, from its 
diagonal values fi{ip, ip), ip e X (in the real case, such a restoration is possible only 
for symmetric forms). Due to this, in order to prove an operator equality A = B, 
it is sufficient to establish the equality for all diagonal values (ip\A'p) — (\p\Bip), 
f eX. 

1.2 Operators 

If A is an operator in X (throughout this book we deal only with linear operators), 
then A* denotes its adjoint, defined by 

{<p\A*f) = (A<p\f) (p,feX. (1.4) 

An operator A is called Hermitian if A = A*. An (orthogonal) projector is a Hermi- 
tian operator P such that P 2 = P. The range of projector P is the subspace 

Z = {f : P\f) = \ f)}. 

If II|| = 1, then \ip)(ip\ is the projector onto the unit vector \ip). More generally, for 
an orthonormal system {e,- } IS / , the operator Yliel I e i) ( e i | = ^* is the projector onto 
the subspace generated by {e, } i e /. 

A unitary operator is an operator U such that U*U = I. In the finite-dimensional 
case, which we here consider, this implies UU* = I. A partial isometry is an 
operator U such that U*U = P is a projector. In this case UU* = Q also is a 
projector. U maps the range of P onto the range of Q isometrically, i.e. preserving 
inner products and norms of vectors. 
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Theorem 1.2 (Spectral Decomposition). For a Hermitian operator A, there is an 
orthonormal basis of eigenvectors to which correspond real eigenvalues aj, such that 

d 

A = 'Y^a i \e i ){e i \. (1.5) 

1 = 1 

Another useful form of the spectral decomposition is obtained if we consider the 
distinct eigenvalues {a} and the corresponding spectral projectors 

E a = XI \ e i)( e i\- 

i:ai—a 

The collection of distinct eigenvalues spec(A) = {a} is called the spectrum of the 
operator A. Then 

A = Q E a ■ (1-6) 

aespec(/4) 

This representation is unique up to the ordering of the eigenvalues. The collection of 
projectors {E a } forms an orthogonal resolution of the identity. 

Ea.E a ' = &aa'E a , E a = I ■ ( 1 - 7 ) 

aespec(/4) 

The Hilbert space X splits into the direct orthogonal sum of the ranges of the projec¬ 
tors {E a }, on which A acts as multiplication by a. 

Both unitary and Hermitian operators are special cases of normal operators, satis¬ 
fying A* A = A A*. For normal operators, the spectral decomposition (1.5) holds, 
with complex eigenvalues a,. 

1.3 Positivity 

A Hermitian operator A is called positive, A > 0, if (tf \ A xf ) > 0 for all f e X. The 
eigenvalues of a positive operator are all nonnegative: a > 0 for a e spec(A). 

The operator is positive if and only if it can be represented as A — B* B for some 
operator B. A positive operator has a unique positive square root, i.e. for any positive 
operator A there exists one and only one positive operator B, denoted by s[~A = A 1 ! 2 
such that B 2 = A. 

For an arbitrary Hermitian operator A, we have 

A = A+-A-, (1.8) 

where A + = Yla>o a E a > A- = — Yl a< o a E a are positive operators called the posi¬ 
tive and negative part of the operator A. 
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Theorem 1.3 (Polar Decomposition). Any operator A in M allows the decomposition 

A = U\A\ = \A*\U, (1.9) 

where \ A | = V A* A is a positive operator and U is unitary. 

The support supp A of a positive operator A is the span of eigenvectors of A cor¬ 
responding to positive eigenvalues. In the polar decomposition, the unitary operator 
is uniquely defined only on supp A. 

Occasionally, we will make use of a real Hilbert space, i.e. Euclidean space, and 
the operators in it. In that case, the Hermitian operators are replaced by symmetric 
operators, and unitary operators by orthogonal ones, with formally similar definitions. 
The polar decomposition of an operator A holds, with |/1| symmetric positive and U 
an orthogonal operator. 

1.4 Trace and duality 

The trace of an operator T is defined by 


d 

TrT = £](e«|7e«), (1.10) 

i=i 

where {e, } is an arbitrary orthonormal basis. 


Exercise 1.4. Show that the definition does not depend on the choice of basis, 
and that 


Tr ,4* = Tr A, Tr AB = r TiBA, 

(1.11) 

where bar denotes the complex conjugate. Show that 


TA\f){q>\A = {(p\Af). 

(1.12) 

Show that for A , B >0 

Tr AB > 0 

(1.13) 

with equality if and only if AB =0. 


The expression 

\\T\\i = Tr|r| 

(1.14) 


defines the trace norm on the complex linear space of all operators in X. Notice that 
for Hermitian T 
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where f; are the eigenvalues of T. Another important norm is the operator norm 

MII = Mlloo= max || A Vi, (1.16) 

tMI iMI=i 

which for a Hermitian A is equal to 


|| A || = max |a|. (1.17) 

aespec(^l) 


There is an important inequality 


\TiTA\ < million, (1.18) 

where equality can be attained either for any fixed T or for any fixed A : for a fixed 

T take A = U*, where U is the unitary from the polar decomposition T = \T\U. 
Thus, 

IITil = maxITrTf/l = max |Tr7M|. (1.19) 

v MII=1 

Similarly, for given A the equality is attained for T = \tjr)(ij/\U*, where U is the 
unitary from the polar decomposition A = U\A\, and xfr is the normalized eigenvector 
of | A\, with the maximal eigenvalue. It follows that 

\\A\\ = max |Tr7M|. (1.20) 

l|T||, = i 

These facts underly an important duality relation. The space of all operators in X, 
equipped with the trace norm, becomes the complex Banach space (X). The same 
space equipped with the operator norm (and the product operation) is the Banach 
algebra 99 (Jf). These two Banach spaces are in mutual duality, which means that 
every linear function on (X) has the form T —* Tr TA for some A e 93(JC), with 
the norm given by (1.20), and, conversely, every linear function on 93 (X) has the 
form A —> Tr TA for some T e with the norm (1.19). 

The subscript h will be applied to spaces of operators to denote the corresponding 
real Banach spaces of Hermitian operators. Then, the real Banach spaces and 

93 h{M) are again in the mutual duality, provided by the bilinear form T, A -> Tr TA. 
It follows from (1.11) that this form is real-valued for Hermitian T, A. 

In the finite-dimensional case the spaces ^(JC) and 93 {X) coincide with the space 
of all operators in X and differ only by the norms. In the infinite-dimensional case, 
however, they are substantially different (just as the classical / 1 and l°° spaces over 
finite versus infinite sets). 

Exercise 1.5. Show that the complex dimensionality of the space of all opera¬ 
tors in X is d 2 , while the real dimensionality of the subspace of all Hermitian 
operators in X is again d 2 . If X is a real, d -dimensional Hilbert space, the di¬ 
mensionality of the space of all operators in X is d 2 , while the dimensionality of 
the subspace of all symmetric operators in X is d(d + l)/2. 
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1.5 Convexity 

A subset © of a real linear space is convex if, for any finite collection of points {Sj } C 
© and any probability distribution { pj }, the convex combination S — Ylj = i PjSj is 
in © (it is sufficient to impose this requirement only on collections of two points, 
which means that if the set © contains two points, it must contain the whole segment 
connecting these points). In a convex set, a special role is reserved for extreme points, 
which cannot be represented as a nontrivial convex combination of other points. This 
is equivalent to the fact that S = pSi + (l — p)S2,0 < p < 1, implies S — Si — S 2 , 
which means that there is no segment in © containing S as its interior point. We 
denote by ext(©) the set of all extreme points of a convex set ©. The following 
general result holds: 

Theorem 1.6 (Caratheodory). Let © be a compact convex subset o/M”. In this case, 
every point S e © can be represented as a convex combination of at most n + 1 
extreme points Sj e ext(©).' 


n +1 

5 = J2 Pj S I > S J e ext(©). (1.21) 

j = i 

As an example consider the convex set of all probability distributions P = 
{pi ,..., p n +i} on a set of n + 1 points. Its extreme points are the degenerate dis¬ 
tributions, for which all pj are zero, except for one which is equal to 1. There are 
n + 1 extreme points and every point of is uniquely represented as their convex 
combination, with the coefficients pj . This set is called simplex, and uniqueness of 
the representation is the characteristic feature of this convex set. 

A real function f on a convex subset © of a finite-dimensional linear space is 
convex ( concave ) if 


for an arbitrary convex combination of points Sj e ©. It is affine if it is both convex 
and concave. More generally, a map f from one convex set into another is called 
affine if it preserves convex combinations: 

^(Epa^E^)- 
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Exercise 1.7. A continuous convex (in particular, affine) function on a compact 
convex set © attains its maximum at an extreme point of this set. 


1.6 Notes and references 

The basics of the operator formalism of quantum mechanics were laid down in Dirac’s 
classical treatise [53]. An excellent introduction to finite quantum systems is given 
in Feynman’s lectures [59]. From the vast variety of textbooks on linear algebra, 
we mention the modem course of Kostrikin and Manin [138], which takes into ac¬ 
count the needs of quantum theory, and the book of Glazman and Ljubich [68], which 
is aimed at the active study of noncommutative, finite-dimensional analysis through 
solving problems. A variety of topics in modem matrix analysis, including many ma¬ 
trix inequalities, is covered in the book of Bhatia [27]. The fundamentals of convex 
analysis are exposed in, e.g. the books of Magaril-Il’jaev and Tikhomirov [154] and 
Rockafellar [172], 
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States, observables, statistics 


2.1 Structure of statistical theories 

2.1.1 Classical systems 

A classical system is described in terms of its phase space £2, the points a> of which 
label deterministic states of the system. For simplicity, we consider in what fol¬ 
lows finite sets £2. By classical (statistical) state we denote a probability distribution 
P = {pa>} on £2. The collection of all probability distributions is the simplex 5)B(f2), 
in which every point is uniquely represented as a convex combination of extreme 
points - pure states, given by probability distributions that are degenerated in the 
points © e 12. The simplest nontrivial system is the bit, the system with two pure 
states 0, 1 , for which the set of statistical states is isomorphic to the unit segment of 
the real line. 

Any random variable is a function X = {x w } on the phase space 12 , defining a 
decomposition of the space 12 into non-intersecting subsets 12*, in which X takes on 
the values x. The indicators E x (a> ) of these subsets satisfy the conditions 

E x {(t>)Ey{(o) = S x ,yE x {(o)', y ' E x ((o) = 1 . 

Jt 

Along with random variables, which will henceforth be called sharp observables, one 
can consider unsharp observables which take the values x with probabilities M(x\a>) 
(such as observables with random error or intentionally randomized). The collection 
of conditional probabilities M = {M(x\a>)} is characterized by the properties 

M(x\a>) > 0; ^ M(x\a >) = 1. 

Jt 

For a sharp observable the probabilities M(x\u>) = E x (a>) are equal to 0 or 1, i.e. 
M(x\co) 2 = M(x\co). 

The probability distribution for the observable M in the state P is given by the 
formula 

l^p(x) = Pa>M(x\a>). ( 2 . 1 ) 

CO 

For a smooth transition to quantum systems it is useful to introduce a matrix rep¬ 
resentation of the classical quantities. Consider the Hilbert space spanned by a fixed 
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orthonormal basis {| u>)\u> G £2}. Given any function f w defined on £2, we consider 
the diagonal operator 

/ = ]T/®|a>)H- 

CO 

Then a classical state gives rise to the operator P = Pco\&>}{&>\, characterized by 

the properties 

P> 0; Tr P = 1. (2.2) 

A classical observable gives rise to the resolution of the identity, i.e. a collection of 
operators M = {M x }, where 

M x = ^2 m(x\(o)\co)(co\, 

CO 

satisfying the conditions 

M x > 0; J2 M * = L (2.3) 

Jt 

For a sharp observable, the operators M x = E x are pairwise orthogonal projectors. 

The relation (2.1) for the probability distribution of an observable can be rewritten 
as 

pjp (x) = Tr PM X . (2.4) 

2.1.2 Axioms of statistical description 

At the basis of the mathematical structure of quantum theory, as well as of any other 
statistical theory, lies the separation of a statistical experiment into the two stages 
of preparation and measurement, which underlies the duality between states and ob¬ 
servables. The following set of axioms, which is applicable to any statistical theory, 
stresses the parallel between the statistical description of classical and quantum sys¬ 
tems. 

Axiom 1. Let there be a given set ©, whose elements are called states, and a set TR, 
whose elements are called (finitely-valued) observables. For arbitrary M G 3R, there 
is a finite set X M of outcomes. For arbitrary S 6 © and M g 3R there is a probability 
distribution on X M , called the probability distribution of the observable M in 
the state S. 

The state S is interpreted as a more or less detailed description of the preparation 
of a statistical ensemble, i.e. a sample of independent individual representatives of 
the system under consideration, and the observable M, as a quantity that can be mea¬ 
sured by a definite apparatus for each representative in the given ensemble. Axiom 1 
thus presupposes the reproducibility of the individual experiments and the stability of 
frequencies under independent repetitions. 
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preparation 


S 


measurement 




Figure 2.1. A quantum state described by a density operator S characterizes the preparation 
of a system, whereas the statistics of a measurement is described by a probability measure 
l±M(x), where x labels the possible outcomes. 


The following axiom expresses the possibility of the mixing of statistical ensembles. 

Axiom 2. For arbitrary S\, S 2 e © and an arbitrary number p, 0 < p < 1, there 
exists an S G © such that p. 1 ^ = pp^ + (1 — p)^ 2 f° r all M G SCR. S is called the 
mixture of states S 1 and S 2 with weights p, 1 — p. 

The next axiom describes the possibility of processing the information obtained 
from a measurement of an observable. Let M\ , M 2 G SCR, and let / be a function 
from X Ml to X Ml such that, for all S G ©, 

P' / s 2 (y)= X! 

x:f(x)=y 

here, observable M 2 is called the coarse-graining of the observable M\. In this case, 
we write M 2 = f 0 M 1 . 

Axiom 3. For an arbitrary observable M\ G SCR and an arbitrary (necessarily, finitely- 
valued) function / on X Ml , there exists an observable M 2 € Tft, such that M 2 = 
f o M\. 

A pair of non-empty sets (©, SCR) satisfying the axioms 1-3 will be called statistical 
model. The statistical model is said to be separated if 

Axiom 4. From the fact that i±^ = for all M G 3J1 it follows that Si = S 2 , and 
from = pu^ 2 for all S G © it follows that Mi — M 2 . 

For a separated model both the operation of mixing in © and the coarse-graining in 
TR are uniquely defined. Thus, the set of states © obtains a convex structure, and the 
set of observables TR has the structure of a partial order. 

Observables M 1 ,..., M m are called compatible if they are all coarse-grainings of 
some observable M, that is, Mj = fj o M for j = 1 ,... ,m. The outcome of 
compatible observables can be obtained as a result of data post-processing in a single¬ 
measurement experiment. Statistical models in which all observables are compatible 
are, in fact, classical. 
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Proposition 2.1. Let (©, TR) be a separated statistical model in which there exists 
the maximal observable M*, such that all other observables are obtained by coarse- 
grainings of this maximal observable. In this case, there exist a finite set LI, a one- 
to-one affine map S -» Ps of the convex set of states © into the set of all probability 
distributions on £2, and a one-to-one map M -» fM of the set of observables TR onto 
a subset of random variables on £2, preserving the relation of coarse-graining, such 
that 

Us ( x ) = X p S ((*>). (2.5) 

Co:f M (a>)=x 

Proof Let £2 be the set of outcomes of the maximal observable M*, and let Ps((o) 
be its probability distribution in a state S . According to the assumption of maximality, 
for any observable M there is a function j m , such that the relation (2.5) holds. The 
fact that the maps S Ps and M -» J'm are one-to-one follows from the sepa- 
ratedness of the model. Checking the properties of the maps is left as an exercise to 
the reader. □ 

Thus the outcome of this “maximal observable” constitute the phase space £2 of 
the system, and all states are represented by probability measures on £2. The main 
classical model is the Kolmogorov model, in which the set of states is a collection of all 
probability distributions 5]B(f2), and observables are described by random variables on 
f2. The above proposition means that a statistical model in which all the observables 
are compatible can be embedded within the Kolmogorov model. 

Another important classical model, which we call the Wald model, differs from 
Kolmogorov’s in that it embraces “unsharp” or “randomized” classical observables 
(see Section 2.1.1. Randomization was consistently used by Wald in statistical deci¬ 
sion theory, and by von Neumann in the game theory). To obtain a result similar to 
Proposition 2.1, one should introduce stochastic versions of coarse-graining and com¬ 
patibility. Let Mi, M 2 e TR, and II be a transition probability from X Mx to X Ml , 
such that for all S G © 


Ps 2 (y) = 

Jt 

In this case, observable M 2 is called the stochastic coarse-graining of the observable 
M \. Observables Mi, M m are stochastically compatible, if they are all stochas¬ 
tic coarse-grainings of one observable M. In the Wald model, all observables are 
stochastically compatible and any separated model with this property can be embed¬ 
ded within a Wald model. 

In the statistical model of quantum mechanics, states and observables are described 
by operators in a Hilbert space. Many efforts were spent on obtaining a reasonable 
axiomatic scheme, leading to the Hilbert space formalism of quantum mechanics. 
However, this is not our goal here and we will proceed in a different way. We take for 
granted the description of quantum states as density operators in a Hilbert space and 
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derive from it the most general concept of quantum observable (including both “sharp” 
and “unsharp” observables), following from the axioms of the statistical model. 

2.2 Quantum states 

A state of a quantum-mechanical system represents a statistical ensemble of indepen¬ 
dent, identically prepared copies of the system. 

Definition 2.2. A quantum state is described by a density operator, i.e. a Hermitian 
operator S' in a Hilbert space X of the system, satisfying the conditions 

S>0, TVS = 1. 

Let <Z(X) be the set of all density operators. It is a convex subset of the real linear 
space of all Hermitian operators in X. A convex combination S = £ • pj Sj of 
density operators will describe the mixture of the corresponding statistical ensembles, 
obtained by taking ensembles prepared in the states Sj and mixing them, with weights 
Pj . In a quantum statistical ensemble there are two kinds of randomness. One is due 
to fluctuations in the classical parameters of the preparation procedure. The other one 
is the intrinsically quantum irreducible randomness, present in any quantum state. The 
following theorem characterizes states without classical randomness. 

Theorem 2.3. Extreme points of the set of quantum states S(X), called pure states, 
are precisely one dimensional projectors. 

Proof The spectral decomposition of the Hermitian operator S reads 

d 

S = ^si\ei)(ei\, Sj > 0, = 1, (2.6) 

i = l 

where Si are the eigenvalues and | ef) are the eigenvectors of S. If S is extreme, 
this sum can contain only one nonzero term, implying that S is a one-dimensional 
projector. Conversely, let S be a one-dimensional projector, and 

5 = pSi + (1 - p)S 2 , 0 < p < l. 

Taking the square, we obtain 

0 = 5 - S 2 = pSi(I - Si) + (1 - p)S 2 (I - S 2 ) + P( 1 - p)(Si - S 2 ) 2 . (2.7) 

This equation is the sum of three positive operators, each of which must therefore be 
equal to zero. But this implies Si — S 2 = S, i.e. S is extreme. □ 
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From the viewpoint of probability theory, one-dimensional projectors are the non- 
commutative analog of distributions that are degenerated at some phase space point, 
while the role of the uniform distribution is played by the chaotic state, with the den¬ 
sity operator S = ^ I. 

Exercise 2.4. Show that if dim X = d, <5(X) can be embedded into the real 
vector space of dimensionality n = d 2 — 1. If X is Euclidean (real Hilbert) 
space, then n = d(d + l)/2 — 1. 


The spectral decomposition (2.6) shows that every quantum state is a mixture of no 
more than d pure states, where d = dim X. Thus, in the case of a set of quantum 
states, the Caratheodory Theorem overestimates the number of extreme points that are 
actually present in the mixture (1.21). On the other hand, this theorem gives the exact 
value for the simplex of probability distributions on a “phase space” £2 = {1,..., 
n +1}, representing the statistical states of a classical system. This suggests the notion 
of interpreting quantum theory as a probabilistic model with specific nonclassical 
constraints (a hidden-variable theory). For a single quantum system, such a viewpoint 
is possible, although up to now it did not prove to be productive compared to standard 
quantum mechanics. On the other hand, for composite systems this viewpoint leads 
to an inevitable contradiction with the physical principle of locality (separability) (see 
Section 3.2.1). 

The simplest, and yet fundamental example is qubit - a two-level quantum system, 
dim X = 2. We use the canonical basis: ^ j, ” j. It is convenient to introduce the 
basis in the real space of 2 x 2 Hermitian matrices, called Pauli matrices 
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In particular, a density operator S G <B(X) can be represented as 


1 

S = -(/ + a x o x + a y o y + a z o z ) 


1 

2 


1 —|- a z o x i a y 

a x T i a y 1 a z 


( 2 . 8 ) 


The condition det S > 0 imposes the following constraint onto the Stokes parameters 
a — {a x ,a y ,a z )\ 


\a\ 2 = a 2 + a 2 + a 2 < 1. 

Thus, <B(X) as a convex set is isomorphic to the unit ball in M 3 , which we shall call 
the Bloch ball. 

I Exercise 2.5. Show that the density operator (2.8) has the eigenvalues . 
Hint: use the results of Exercise 1.5. 
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The pure states are characterized by the condition a* + a^ + a\ — 1 and form 
the Bloch sphere. By introducing Euler angles 9, <p, so that a z = cos 9, a x + ia y = 
sin 9e l< t > , we have S = |0(2))(0(2)|, where 


1 ^( 2 )) = 


cos § e-^' 2 ' 
.sin | e 1 *' 2 _ 


(2.9) 


For an electron, which is a spin-1/2 particle, the vector i jr(a) describes the pure 
state, with the spin direction a produced by a Stem-Gerlach filter with this direction 
of the magnetic field gradient. The mixed state, with a x — a y — a z — 0, described 
by the density operator S — jl for which all spin directions are equiprobable, is the 
chaotic state (see the Feynman lectures [59] for more detail). 

Another important example of a two-level system is the polarization of a monochro¬ 
matic photon, as visualized in classical optics experiments. In this case, the parameter 
| is the angle of linear polarization, while ^ characterizes circular polarization. For 
a linearly polarized photon 0 = 0. In particular, for a vertically polarized photon 
9=0, while for a horizontally polarized one 9 = n. 


2.3 Quantum observables 

2.3.1 Quantum observables from the axioms 

Consider a statistical model in which the state space is the convex set S(Jf) of all den¬ 
sity operators in a Hilbert space X. Let M be an observable. Now, by the Axiom 2, 
the probabilities of the outcome of the measurement (x) should be affine functions 
of state. 

Theorem 2.6. Let the map S —> pis be an affine function on <5(X) such that 
0 < p.s < 1 -In this case, there exists an operator M on X with 0 < M < I 
such that, for every S e &(X) 


pts = Tr SM. (2.10) 

Proof (Sketch). Let us show that it is possible to extend the map S -*■ pis to a well- 
defined linear function on the space T^h(X) of all Hermitian operators in X. 

First, we note that by separating positive and negative parts in the spectral decom¬ 
position (1.6), every Hermitian operator A can be represented as the difference of two 
positive operators, e.g. as in (1.8). Normalizing by the traces, we can always write 

A = t+S+ - t-S-, t± >0, S± G <B(X). (2.11) 

The representation of an operator A in the form of (2.11) is, of course, not unique. 
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Exercise 2.7. Using the affinity of the function S -»■ i±s, show that the value 
[iA — t+fis + — is uniquely defined (i.e. depends solely on the operator 

A, but not on the concrete decomposition of A into the difference of positive 
operators), and that A — > p,A is a real, linear function. 

Next, every operator A can be represented as A = A\ + iA 2 , where A\ — 
A + A*), A 2 = jj(d — A*) are Hermitian operators. This representation is unique, 
and by setting /a a = p>A\ + iP -a 2 , we obtain the unique complex linear extension of 
the function pa to the space of all linear operators in X. Any such function apparently 
has the form 

Ia a = TtAM, (2.12) 

where M is an operator in H. Taking A = |y/)(y/|, where ft is a unit vector and 
using (1.12), we have, by assumption, 0 < (f/\Mft) < 1 for all ft , hence 0 < M < I. 

□ 

A collection of operators {M x ,x e X} is called the resolution of the identity or 
probability operator-valued measure 1 (POVM) on X if 

M* = M x > 0, M x = I. 

X 

Corollary 2.8. Let S —> {jisix), x g X} be an affine map of the convex set of 
quantum states <5(Jf) into the set of probability distributions on a finite set X. In this 
case, there exists a POVM M = { M x } in X such that 

fA S (x) = Tr SM X , x e X. (2.13) 

The proof of the corollary is left as an Exercise to the reader. This corollary leads 
to the following 

Definition 2.9. A quantum observable with outcomes in X is described by a POVM 
{M x ,x 6 X}. The probability distribution of the observable M = {M x } in a 
state S is given by the formula (2.13). 

By accepting this definition of a quantum observable, we introduce the maximal 
statistical model, with the state space coinciding with the set of all density operators 
in X. In the standard expositions of quantum mechanics, the more restricted notion 
of quantum observable is used. 

Definition 2.10. The quantum observable is called sharp if all operators M x — E x 
are projectors: E x — E x . 


1 The term “positive operator-valued measure” is also used in the literature, which is however less 
precise, as it does not reflect the normalization of the measure. 
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I Exercise 2.11. Show that the condition = E x , for all outcomes x, is equiva¬ 
lent to E x E y = 8 xy E x for all x, y, i.e. the projectors E x are mutually orthogo¬ 
nal. 

The corresponding resolutions of the identity are called orthogonal. Sharp observ¬ 
ables are thus described by orthogonal resolutions of the identity in X. Observables 
for which the outcomes are real numbers, i.e. Id, are called real. The spectral 
decomposition 


x = J2 xE * 

xeX 

sets a one-to-one correspondence between sharp real observables E = { E x } and 
Hermitian operators X in X (which are also called observables). Coarse-graining 
for such observables takes the form of functional calculus. Namely, for a function 
/ : R -»• R, an observable / o E corresponds to the Hermitian operator f(X) 
(Exercise). 

The expectation (mean value) of a sharp real observable X in the state S is given 
by the Born-von Neumann statistical formula 

E S (X) = (*) =TrSX. 

The relation (2.13) can be regarded as an extension of this formula. 

The statistical model in which states are described by density operators and observ¬ 
ables - by Hermitian (self-adjoint) operators was considered in great detail by von 
Neumann [212], 

2.3.2 Compatibility and complementarity 

Definition 2.12. The commutator of the operators X, Y is [X, F] = XY — YX. Op¬ 
erators X, Y commute if [3f, Y] = 0. 

Theorem 2.13. Let E = {E x } and F = { F y } be sharp observables. In this case, the 
following are equivalent: 

i. E, F are compatible; 

ii. projectors E x and F y commute for all x, y; 

If E, F are real observables, this is equivalent to 

iii. the corresponding Hermitian operators X = xE x , Y = 'ffy yF y commute. 
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Proof. 

ii =M Put M x ,y — E x F y = F y E x . Since the product of commuting projec¬ 
tions is a projector, M = {M Xjy } is a (sharp) observable. Moreover, E = f o M, 
F = g o M, where f(x, y ) = x, g(x, y) = y. Thus, E , F are compatible. 

i => ii If E, F are compatible, there exists an observable M = {M z } such that 

e x = M ” p y= £ M * 

z:f(z)=x z:g(z)=y 

for some functions f g (we agree to set M z to zero if there is no z satisfying the 
condition in question). 

Lemma 2.14. Let 0 < A < P, where P is a projector, then [A, P] = 0. 

Proof. We have (/ — P)A(I — P) — 0, hence s/~A{I — P) =0, hence A(I — P) — 0, 
hence A = AP, hence PA = {AP)* = AP. □ 

Now, if f(z) = x, E x > M z and by lemma [E X ,M Z ] = 0. If /(z) ^ x, 
I — E x > M z , and again [E x , M z ] = 0. Thus, E x commute with all M z and hence 
with F y = J2z:g(z)=y M z- 

ii =y iii is obvious. Conversely, let [X, Y] = 0. In this case, [/ (X), g(Y )] = 0 for 

arbitrary polynomials f g. Taking the polynomials, which vanish at all points of the 
corresponding spectra except x (resp. y), we obtain [E x , F y ] = 0. □ 

If E, F are compatible, then denoting M xy = Y^ z -.f( z )= x , g ( z )= y M z , we have 

F X — y ' M xy , Fy = y ' M xy . 

y x 

Observable M describes the statistics of joint measurement of E, F. Their joint prob¬ 
ability distribution in the state S is given by 

iis F ( x >y) = T*SM X y. 

One can similarly define a joint measurement and probability distribution for an arbi¬ 
trary finite collection of compatible observables. 

Exercise 2.15. Show that the only observable compatible with all quantum ob¬ 
servables is a constant, i.e. the observable equal to a scalar multiple of the unit 
operator. 

Existence of (the vast variety of) incompatible observables is a manifestation of 
the quantum feature of complementarity. Physical measurements on micro-objects 
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are implemented by macroscopic experimental devices, each of which requires a spe¬ 
cific organization of the environment in space and time (such as e.g. Stern-Gerlach 
experiments). Different ways of organization, corresponding to measurements of dif¬ 
ferent observables, may be mutually exclusive (despite the fact that they relate to the 
identically prepared micro-object), i.e. complementary. Complementarity is the first 
fundamental distinction between quantum and classical statistical models. 

Example 2.16. Consider a spin-1/2 particle, and let the unit vector 3 = (a x ,a y ,a z ) 
give a direction, then 

o(a) = a x o x + ayOy + a z o z = ° z . ° x (2.14) 

y a x +1 a y —a z 

is the observable of the projection of the spin 2 onto the direction a. Indeed, the op¬ 
erator a (a) has the eigenvalues ±1 (spin along and opposite the direction a), and the 
spectral decomposition is given by 

a(B) = \f(a))(f(B)\-\f(-a)){f(-a)\= * z a x ia y (2.15) 

t+X > * Uz 

Recall that a has the Euler angles (8, <p), and ij/(a), given by (2.9), is the vector of the 
pure state in M, with the spin direction a. The corresponding density operator, which 
is the one-dimensional projector onto ^(a), is 

S(a) = \f(a))(x/r(a)\ = 

Exercise 2.17. Prove 

ct(3i)ct(3 2 ) = 0i ■ a 2 )I + ia{a\ x a 2 )- (2.16) 

Written for the basis vectors this amounts to 

cr 2 = I, a y~ I’ a z = I* 

0\ Oy = i o z , OyO z — i o'x. a z o x = io y . (2.17) 

Taking into account that Trcr(a) = 0, the relation (2.16) implies the formula for 
the mean value of a(b): 

Tr S(a)a(b) = a - b. 

Another corollary of (2.16) is 

[a(ai),a(a 2 )] = 2ia(a\ x a 2 ), (2.18) 

which implies that a(a i), ct( 3 2 ) are compatible if and only if a i = ±a 2 . In particular, 
the spin components o x ,a y ,a z are incompatible observables. 


2 


In physical applications there is a dimensional factor h/ 2, which can be removed by an appropriate 
choice of units. 
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2.3.3 The uncertainty relation 


If X, Y are two operators, we can write 


XY = X o Y + i[x,r], 

where X o Y = ^(XY + YX) is the symmetrized or Jordan product of X, Y. 

Let S be some state. For an arbitrary collection X = {X],..., X n } of sharp, 
real observables we introduce two real matrices, one symmetric and another skew- 
symmetric 


B S (X) = [TrSXfoV? 

L J * Jj,k=l,...,n 


Cs(X) = ,(2.19) 

L J 7, 


where Xj = Xj — I Es(Xj). The matrix Bs(X) is called the covariance matrix, 
while Cs(X) is called the commutation matrix of X . We have 


B s(X) > ±^C 5 (X) (2.20) 

in the sense of an inequality between complex Hermitian matrices. Indeed, the Her- 
mitian matrix 


B S (X)-ic s Or) = [TV5X y %»] M __. n 

is positive definite, since for arbitrary cj € C 

n 

Cjc k TrSXfx= Tr SZ*Z > 0, 

j,k= 1 

where Z = J2j=i c jXj- This produces the inequality (2.20) with the + sign. The 
inequality with the minus sign follows after taking the transposes. 

For two observables Xi = X — IEs(X) and X 2 = Y — I Es(Y ), the inequal¬ 
ity (2.20) is equivalent to the Schrodinger-Robertson uncertainty relation 

D 5 (Z)DsG0 > {E S (X - IEs(X)) o (Y - /E 5 (T))} 2 + \\E S [X, T]| 2 , (2.21) 
where 

D s (X) = TXS(X -IEs(X)) 2 (2.22) 

is the variance of the sharp, real observable X in the state S. The quantity 

Es(X-IEs(X))o(Y-IE s (Y)) (2.23) 


represents the covariance of X, Y in the state 5. If the observables X, Y are com¬ 
patible, this quantity coincides with the covariance of random variables, in the sense 
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of probability theory, with respect to their joint probability distribution. In this case, 
Es [X. 7] = 0 and (2.21) reduces to the Cauchy-Schwarz inequality for the covari¬ 
ance of random variables. However, if X, Y are not compatible, X, Y are not measur¬ 
able in a single experiment and the variances Ds'(X), 05(7) in the uncertainty rela¬ 
tion refer to the two different measurements performed over different representatives 
of the same statistical ensemble, while the covariance is just a formal characteristic of 
the state. 

Exercise 2.18. Prove the noncommutative Cauchy-Schwarz inequality 

|TV SX*y | 2 < Tr SX*XTr ST *y, (2.24) 

for arbitrary state S and operators X,Y in M. 

2.3.4 Convex structure of observables 

The set of all (finite-valued) quantum observables in a Hilbert space M will be de¬ 
noted by Given a Hilbert space M, the sets &{M) of quantum states and 

of quantum observables, by construction, together with the generalized Bom- 
von Neumann statistical rule (2.13), satisfy the axioms of separated statistical model. 
However, this model has an additional convexity structure in which is to some 

extent similar to that of &{M). 

Let {M J } be a finite collection of observables with the same space of outcomes X. 
Given a probability distribution {pj}, we can define the mixture M = {M x ; x e X\ 
of these observables in an obvious way 

M x = xeX. (2.25) 

j 

Thus, the set 3Jfx °f all observables with a given space of outcomes X is a convex 
set. Similar to what is the case with states, mixtures of observables describe measure¬ 
ments with fluctuations in the classical parameters of the measuring procedure. 

The following result describes the relation between observables without classical 
randomness and sharp observables. 

Theorem 2.19. Any sharp observable M € 3Jfx is an extreme point of 9Jfx- In the 
reverse direction, an extreme observable M e 3Jfx, with commuting components, 
[M x , M x /] = 0, is sharp. 

Proof. Let M be sharp, and assume M = pM l + (1 — p)M 2 , 0 < p < 1. In this 
case, similar to (2.7) 

pM l x (I - Mj) + (1 - p)M 2 (I - M 2 ) + p( 1 - p)(M x - Ml) 2 = 0. 
and therefore = Ml = M x , and M is extreme. 


( 2 . 26 ) 
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Now notice that in all cases, M x < I and hence M x < M x . Taking some x e X 
we have 


Mx = \mI + l -(2M x - M?); (2.27) 

M x > = l -M x ,{I + M x ) + X -M x ,{I -M x ), x'±x. (2.28) 

If [M x , M x '] = 0, these relations represent M as a convex combination (with equal 
weights) of two other observables. If M is extreme, they should coincide with M, in 
particular M\\ = M x . Hence, M is sharp. □ 

It follows that in the classical case extreme observables coincide with the sharp 
ones, giving them a very clear characterization as observables without randomness in 
the measuring procedure. In the quantum statistical model, things are not so simple 
(and much more interesting). We call the two-valued observables tests (they are also 
called propositions, questions, effects in the literature, and play a central role in var¬ 
ious axiomatic approaches although, as we have just seen, they have a rather special 
property as concerns the important issue of extremity). The set of extreme quantum 
observables is exhausted by sharp observables only in this case of two outcomes. This 
fact follows from the theorem, because any test necessarily has commuting compo¬ 
nents {Mo, M\ — I ~ Mq). Thus, any extreme test is completely determined by the 
projector P = Mo- 

For observables with more than two values, we consider the following construc¬ 
tion. 

Definition 2.20. Let {\i/ x )} be an arbitrary, finite collection of not necessarily nor¬ 
malized vectors in M such that \ x l r x){i , x\ = / . Such a collection is called an 
overcomplete system. 

If IIV'*)} is an overcomplete system, an arbitrary vector \\fr) e M can be repre¬ 
sented as a linear combination 

\ty) = Yh c x\tx) with c x = {f x \f), (2.29) 

* 

where the coefficients {c x } need not be unique, because the vectors 1^) may be lin¬ 
early dependent. There also is a corresponding matrix representation of the operators. 
In ad -dimensional Hilbert space, overcomplete systems with more than d vectors 
always exist, and can be built from any complete (possibly, linearly dependent) sys¬ 
tem as follows. Let ||0 X )} be a complete system of vectors in M. The corresponding 
Gram operator is defined by 


X 


(2.30) 
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Completeness implies that G is nondegenerate. Now, {\ijf x }}, with |t j/ x ) — G l ^ 2 \<p x ), 
is an overcomplete system because 

£ \f x )(f x \ = G- l < 2 £ \<Px)(<Px\G - l/2 = I. (2.31) 

X X 

An overcomplete system determines a quantum observable M with the components 

M x = \f x )(f x \, xeX. (2.32) 

In particular, for any orthonormal basis {e x }, the observable M — {\e x )(e x \} is sharp 
and hence, an extreme observable. 

Theorem 2.21. An observable (2.32) is extreme if and only if the operators M x are 
linearly independent in the real vector space < ’Sf l (Jf ). 

Proof Let M be extreme and assume that 

Y c M(fx\=°- (233) 

X 

For e > 0 small enough, we define 

M± = (1 ± ec x )M x >0 x € X. 

In this case, M ± are observables and, by construction, M = \M + + \M~. There¬ 
fore, Mf~ = Mf = M x since M is extreme. Thus, (2.33) implies c x = 0, i.e. the 
components of M are linearly independent. 

Conversely, let 

\f x )(f x \ = pM}+(l-p)M 2 

be a convex decomposition of M. In this case, 0 < pMl < \f x )(f x \. By 
Lemma 2.14, 

M x \f x )(f x \ = \f x )(f x \Ml = (f x \f x )Ml, 
whence = X x \f x ){f x \, with X x = {f x \Ml\f x )/(f x \f x ) 2 . Now, 

T, x ^x\tx)(tx\ = /,i.e. 

Y&x-Wx)Wx\ = 0. 

Due to the linear independence of M x , X x = 1, and M x = M x for all x, which 
means that M is extreme. □ 

From the result of Exercise 1.5, it follows that the dimensionality of the real space 
of Hermitian operators is equal to d 2 . Thus, all overcomplete systems with n linearly 
independent components, d < n < d 2 determine unsharp extremal observables. A 
concrete example of such an observable with n = 3, d = 2 will be considered in the 
next section. 
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2.4 Statistical discrimination between quantum states 

2.4.1 Formulation of the problem 

In this section, we consider a statistical decision problem which will allow us to pass 
to the optimal information transmission via quantum channels in Part II of the book. 

Let a quantum system be prepared in one of the states S x , x = l,... ,n. The ob¬ 
server is allowed to perform arbitrary measurements aimed at determining, in the best 
possible way, in which of these states, given a priori, the system was actually pre¬ 
pared. This kind of decision problem is typical for mathematical statistics and its ap¬ 
plications in communication theory, where we deal with the optimal signal detection 
or estimation (see Section 3.3.1). On the other hand, in high-precision experiments, 
researchers are already able to operate with elementary quantum systems such as sin¬ 
gle ions, atoms, and photons. This leads to potentially important applications, such as 
quantum communication and quantum cryptography. A quite important issue is that of 
extracting the maximum possible information about the state of a given quantum sys¬ 
tem. In proposals currently under discussion for application to quantum computing, 
the information is written into states of elementary quantum cells, named qubits, and 
is read via quantum measurements. From a statistical point of view, a measurement 
gives an estimate for the quantum state, either as a whole, or for some of its com¬ 
ponents (parameters), and the problem of finding the most informative measurement 
arises. 

The statistic of the whole measurement procedure is described by an observable, 
i.e. by resolution of the identity M = {M x } in the system space M. The probability 
to make a decision y, under the condition that the system was in the state S x , is 
equal to PM(y\x) = Tr S x M y . In particular, the probability of making a correct 
decision is equal to pm( x\x). Let us additionally assume that the states S x appear 
with probabilities n x (e.g. in the case of equiprobable states n x = 1 / n). In this case, 
the average probability of making a correct decision is equal to 

n 

P{M} = Yh n xPM(x\x), 

X~\ 

and the problem is to maximize it with respect to all possible observables M. 

2.4.2 Optimal observables 

Denoting by W x = n x S x > 0, the average probability of a correct decision can be 
written explicitly as an affine function 


n 

P{M}= ^TVI V X M X , 

X=l 
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on the convex set of observables with n outcomes 

an„ = I M = {M, :M X > 0,f^M x = I 

{ x=\ 

Maximization of an affine function defined on a compact convex set is a typical linear 
programming problem. 

Theorem 2.22. The average probability of a correct decision P{M } attains its max¬ 
imum at an extreme point of the set 9Jl n . An observable M® is optimal if and only if 
there exists a Hermitian operator A, such that 

i. (A - W X )M$ = 0; 

ii. A > W x . 

The following duality relation holds 

max{^P{M} : M e 9 fl n ) = min{Tr A : A > W x ,x = 1 ,...,«}. (2.34) 

Proof Since IP{M} is a continuous affine function on a compact, convex set 9J1„, the 
first statement follows from the result of Exercise 1.7. 

Let us first prove the sufficiency of the conditions i, ii. Let A be a Hermitian oper¬ 
ator satisfying the conditions i, ii. By using these conditions and the property (1.13), 
we obtain 

P{M} = Tr Y^W x M x < Tr JAM, = TrA. (2.35) 

X X 

Lrom i, we have AM® = W X M If we sum over x and take the trace of the resulting 
equation, we obtain 

Tr A = Tr A M® = Tr W X M% = P{M 0 }. (2.36) 

X X 

Hence, we obtain 

P{M}<P{M 0 }, for all M, (2.37) 

and therefore M® maximizes IP. Notice that 

a = X>*m® = £m>*. 

X X 

Now, we prove the necessity. Put M x = A^, where A x are Hermitian operators 
satisfying the condition Jf x A X = I. Applying the method of Lagrange multipliers, 
we can reduce the problem of maximizing P{M} on the set fll n to the problem of 
maximizing the function 

X \ X / 


(2.38) 
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where A is a Hermitian operator, over all collections of Hermitian operators A x . Here, 
the matrix elements of the Hermitian operator A represent the array of the Lagrange 
multipliers for this problem. Let A® be the optimal collection. In addition, let A x = 
A\ + eY x , with Hermitian Y x , and consider (2.38) as a function of e. By considering 
the coefficients of e and e 2 , we obtain the conditions 

TY[(IL* - A )A° X + A° X (W X - A )]Y X = 0, 

Tr(W x - A)Y 2 < 0 


for an arbitrary Hermitian Y x , i.e. 

(W x - A)A° X + A° X (W X - A) = 0, A-W x >0. 

The second inequality is just the condition ii. of the theorem. By putting 
M x = (A x ) 2 , the first relation implies Tr(A — W X )M X = 0, which, together with 
the second inequality, leads to the condition i. □ 

I Exercise 2.23. Prove that the operator Lagrange multiplier A is the unique solu¬ 
tion A 0 of the dual problem on the right side of (2.34). 

Let us illustrate the meaning and applications of these conditions with some exam¬ 
ples. 

Example 2.24. Consider the classical case where all density operators S x , and hence 
the operators W x , commute. In this case, there is a common orthonormal basis {| &>)} 
in which they are all diagonal, 


W x = yW x (co)\co){co\. 

CO 

In this case, the dual problem has the unique solution 

A 0 = Y max W x ((o)\(o){(o\, 

X 

CO 

where max W x (a>) is the upper envelope of the family of functions W x (a>); x = 

X 

1,...,«. A solution of the initial problem is given by the formula 

M° = yi Qx (co)\co){co\, 

CO 

where denotes the indicator of the subset where subsets c {a> : A°(a>) = 
W x (a>)} are assumed to be non-intersecting and to constitute a partition of the set 
£2 = {m}. 

This solution amounts to the principle of maximal likelihood in classical statistics. 
The decision x should be taken for those observations co for which the posterior gain 
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W x (co) is maximal. Thus, in the classical case, the optimal observable can always be 
chosen to be sharp. This is directly related to the fact that in the commutative case 
extreme points of the set 9Ji„ are precisely the orthogonal resolutions of the identity 
(cf. Theorem 2.19). 



Figure 2.2. Discrimination between two pure states. 


Example 2.25 (Discrimination between the two states). Let So, S\ be two density 
operators, with 7To, their a priori probabilities. An observable (test) is given by 
resolution of the identity {Mo, M\ }, so that Mo + M\ = I . This case can be reduced 
to the previous one, by representing 

JT 0 = ttiSi + JT 0 ', Wi=mSi + Wl, 

where the operators Wq — 7r 0 .S’o — jtiSi, W[ — 0 commute. Thus 

P{M} = TTi + Tr {WqMq + W[Mi) , (2.39) 

and 

A 0 = max{7To5o — rri5i, 0} = (7 To5o — tti5i) + . (2.40) 

Any optimal My has the form 

^o = l(o,oo)OoSo - *4 Si) + ^o, (2.41) 

where the first term is the projector onto the positive part of the spectrum of 
TtoSo — TtiSi, and the second is an operator supported by the null subspace of 
ttqSq — n\S\, such that 0 < Xo < I■ The maximum is given by 


P{M) — 7Ti + Tr (7r 0 S 0 - xiSi) + . 


(2.42) 
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By using the relation 


(jroSo — jtiSi) + = - Itto^o — Trim'll + -(jro^o — ttiSi), 

we obtain 

ma xtP{M} = ^(1 + ||jr 0 So - jriSi||i). 

In particular, taking jro = jti = ^, we have 

ma xP{M} = i ^1 + ^|| S 0 - Stllt^ • 

so that the more distinguishable the density operators So and Si are in the sense of 
the trace norm, the greater the maximum will be. 

Proposition 2.26. Let So = \fo){fo\, Si = \xj/i)(^i\ be pure states. In this case, 
the maximum of IP {M} is given by 

inax IP{M} = i ^1 + yj 1 - 4jr 0 jri|(i/ro|i/ri)| 2 ^ , (2.43) 

and is achieved on a sharp observable. In particular, if no = n\ = 1/2, the maximum 
is equal to 

l-IWohMI 2 )- (2-44) 

Proof. Consider the eigenvalue problem for the rank-two operator 

no\fo){fo\ - jri|^i)(tAi|. 

This operator may have eigenvalue zero, corresponding to the orthogonal complement 
to its support, which is just the linear span X of the vectors \fo), \fi). The relevant 
part of the optimal observable is also supported by X, so we can restrict ourselves to 
the eigenvectors that are linear combinations 

\f) = Col^o) +c 

Substituting this into the eigenvector equation results in 

jr 0 (c 0 + (fo\fi)ci) = Ac 0 ; (2.45) 

-xi({fi\fo)co + ci) = Aci, (2.46) 


max P{M } = ~ f 


1 + 
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whence the eigenvalues are 

JT 0 - JTi ± yj 1 - 4jr 0 jri|(^o|tAl)l 2 

with the corresponding orthonormal basis of eigenvectors |eo), \e\). Now, 

Ikol^oX^ol ~ ni\fi){fi\\\i = Ao - Ai = yj 1 - 4jr 0 jri|(^o|tAi)l 2 ^ 

and we get (2.43). □ 

The sharp optimal observable in the subspace X is just {|eo)(«o|, ki)(«i|}- If 
jr 0 = n\ = 1 /2, the optimal basis is situated symmetrically with respect to |t/^o)» IVo) 
(see Figure 2.2). 



Figure 2.3. Trine on the plane. 



Example 2.27. On a plane (considered as a real subspace of two-dimensional com¬ 
plex space), consider an “equiangular” configuration of the three vectors (see Fig¬ 
ure 2.3) 


I fj) = 


cos 


2 jn' 


2 jn 
3 


j =0,1,2. 


(2.47) 


The corresponding density matrices Sj = \ \f/j) {| describe the states of a two-level 
quantum system such as a linearly polarized photon or a spin 1 /2 particle. 

We have 


Si = r U + 


cos 


sin 


4 jn 
3 

4 jn 


sin 


4 jn_ 


-COS 


4 jn 
3 J 


(2.48) 


so that 




i =o 


3 

2 


/ 


because J2j=o e ' ^ = 0- 
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It follows that = ~S k : k — 0,1,2, is the resolution of the identity that cor¬ 
responds to the overcomplete system j-^fl tyk)\ k = 0, l,2j. The corresponding 
observable is extremal, since it satisfies the conditions of Theorem 2.21. 

Let us show that, in the case of equiprobable states nj = 1/3, observable {M®} is 
optimal, by checking the conditions of Theorem 2.22. Since Sf = Sj , 



so that condition ii. is satisfied: 7/3 = A 0 > Sj/ 3. Condition i. is also satisfied 
since 

= = 0 . 

Thus, maxtf’IM} = Tr A 0 = 2/3. Let us now find the maximum over all sharp 
observables with three outcomes. A nontrivial orthogonal resolution of the identity 
with three components in the two-dimensional space has the form Mo = |eo)(eo|, 
Mi = |ei)(ei|, M 2 = 0, where |eo), \e\), is an arbitrary basis. Thus, the problem 
reduces to the optimal discrimination of the two equiprobable states So, Si. Applying 
relation (2.44) to the case (tM’At) = — we obtain 


21 + V3/2 2 

max ?P{M} = -- < - = max ff’iM}. 

M —sharp 3 2 3 Me3Jl 


Thus, in the quantum state discrimination, using unsharp observables can lead to an 
improvement in comparison with the sharp ones. Let us stress that in the similar 
classical problem (of discrimination between probability distributions) unsharpness 
or randomization can never lead to such an improvement. From a geometrical point 
of view, the reason for this phenomenon is, of course, that in the quantum case not all 
extremal observables (among which the optimal one is found) are sharp ones. 


2.5 Notes and references 

1. Among the texts devoted to the mathematical foundations of Quantum Mechanics 
let us mention the monographs of von Neumann [212], Mackey [153], Segal [181], 
Faddeev, and Yakubovsky [57]. A survey of axiomatic approaches is given in the 
paper of Wightman [219] and also in the Comments to Ch. VIII of the book of Reed 
and Simon [171], One aim of any axiomatics that is longed for is a derivation of the 
Hilbert space formalism. Such a derivation within the framework of the operational 
approach (see below) was realized, in the finite dimensional case, in the paper of 
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Araki [7] (for a more recent approach, see Hardy [75]). However, this is a separate 
topic; for our purposes it will suffice to postulate the state space © = ©(Jf). 

The description of mixed states by density operators (matrices) was introduced inde¬ 
pendently by von Neumann, Weyl, and Landau (see footnote 172 in Section IV.2 of 
von Neumann’s book [212]). 

The axiomatic approach, in which the central role is played by the duality between the 
two partially ordered linear spaces, generated by a convex set of quantum states, and 
by the order interval [0, /] of two-valued observables (effects or tests), was developed 
by the school of Ludwig [152], [61], see also Davies [48] (this approach is usually 
called operational). A detailed investigation of the statistical structure of quantum 
theory, based on the operational approach, is undertaken in the books [107], [99], 
where one can find a detailed bibliography. 

In Proposition 2.1, we had to assume the existence of the maximal observable for the 
sole reason that, for simplicity, from the beginning we restricted ourselves to observ¬ 
ables with finite sets of outcomes. With an appropriate generalization the existence of 
the maximal observable (in general, with an infinite Q) can be deduced from compat¬ 
ibility of all observables of the model. Essentially, this is the message of the famous 
Kolmogorov’s extension theorem for a random process defined by system of com¬ 
patible finite-dimensional distributions. A detailed study of the notion of stochastic 
compatibility, including the proof that the separated model, in which all observables 
are stochastically compatible, can be embedded within the Wald model, can be found 
in the paper [96], 

A brief introduction to the basic notions of quantum theory targeted at quantum infor¬ 
mation science is given in the books of Nielsen and Chuang [158] and Hayashi [78], 
The papers of Fuchs [63], D’Ariano [40], Masanes, and Muller [156] elaborate on 
information-theoretical approaches to the foundations of quantum theory. 

2. The generalized concept of a quantum observable presented here was developed 
by Ludwig [152], Davies and Lewis [48], and Holevo [107]. As concerns the convex 
structure of the set of observables, see Kraus [139]. In analysis and in signal theory, 
overcomplete systems are known under the name “(rigid) frames”. 

The spin of a quantum particle is intrinsically related to representations of the group 
of rotations of three dimensional Euclidean space [218], [59], [153], Results related 
to Theorem 2.21 are discussed in the paper of Davies [47]. 

3. The problem of discriminating the two pure quantum states was treated, in 1968, 
in the paper of Bakut and Shchurov [14]. Such problems arise naturally in connection 
with the detection of weak light sources, and in quantum optics [86], A physical 
device that achieves the bound (2.43) for two coherent states of a radiation field (see 
Chapter 12), called Dolinar receiver, was described in [86]. It uses photon counting 
as a measurement, together with an ingenious feedback from the output of the photon 
detector to the input field of the counter. 
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General detection and estimation theory for quantum states was developed in the 
1970s in works of Helstrom [85], Holevo [91], Yuen and Lax [230], Stratono- 
vich [204], Belavkin [17], and other authors. Example 2.27, with the three equiangu¬ 
lar states, was proposed in [94], For further details and extensive references, see the 
monographs [86], [107], The new interest in quantum estimation theory emerged in 
connection with the ideas of quantum computation. Any such computation is com¬ 
pleted by measuring parameters of the final state of the quantum computer, which 
should be maximally precise. The achievements of modem quantum estimation the¬ 
ory, including asymptotic theory, are presented in the books of Hayashi [78] and 
Petz [168]. The Bayes problem is a linear programming problem (see. [154], [172]), 
and Theorem 2.22 could be proved, basing on the corresponding duality theorem. An 
interesting variation of this problem proposed by Peres [165] concerns the unambigu¬ 
ous discrimination between quantum states when an inconclusive result is allowed. 
For a survey of the recent status of the field see e.g. Herzog [87]. 



Chapter 3 

Composite systems and entanglement 


3.1 Composite systems 

3.1.1 Tensor products 

The new possibilities that are offered by quantum information are to a large extent due 
to the unusual features of composite quantum systems. Let Mj, j = 1,2 be the Hil¬ 
bert spaces of two finite quantum systems with inner products {-\-)j. The combination 
of the two systems is described by the tensor product of the Hilbert spaces, which is 
defined as follows. 

According to Section 1.1, the element 0 e M defines the antilinear function 
0(0) — (0|0) of the argument 0 e M. For any two elements 0/ e Mj ; j — 1,2, 
we denote by 0i ® 02 the bi-antilinear function of the arguments 0i e M\,(p 2 e M 2 , 
defined by the relation 

(01 ® 0 2 )(01,02) = (0l|0l)l(02|02)2. 

Consider the vector space X of finite linear combinations of such functions 
cj 0/ 02 • Introduce the inner product in X by letting 

{<Pi ® <P2\fi ® 02) = (<Pl|0l)l(<P2|02)2 

on the generating elements |0i <8> 02 ) = |0i) <8> | 02 ) and extending, by linearity, to 
X. 


I Exercise 3.1. Show that such a linear extension is possible, and that it is unique 
and satisfies all the properties of an inner product. 

Definition 3.2. The space X with the inner product defined above is called die tensor 
product Mi <g> M 2 of the Hilbert spaces Mi, M 2 . 

Exercise 3.3. Prove the following statement: if {e [}, {e ^) are orthonormal 
bases in Mi, M 2 , then {e{ ® e^} is an orthonormal basis in Mi ® M 2 and 
dim Mi (8) M2 = dim Mi x dim M2. 

It follows that any vector 0 e Mi <8 M 2 is uniquely represented in the form 

10 ) = X]< 0 *l e i) (8 \e\), 
j,k 
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where cjk are complex numbers. Summing with respect to j and denoting = 
Y?jL i c ]k \e{) e Mi, we obtain 

di 

w) = Y i^*) ® 1^2) ’ 

&=1 

which means that the space Jfi 8 ,#2 is isomorphic to the direct orthogonal sum 
Mi © • • • © Mi of d 2 = dim M 2 copies of the space M i. 

For the operators Xj in the spaces Mj we define the tensor product by putting 

(. Xi 8 X 2 )(fi 8> fi) = Xifi 8 X 2 f 2 , 

and extending by linearity. Let a basis in M 2 be fixed, so that M\ (8) M 2 is realized 

ik 

as Mi © • • • © Mi. In this case, Xi <8 X 2 is realized as the block matrix [Xix J 2 ], 
ik 

where [ x 2 ] is the matrix of the operator X 2 . Now, a general operator X in M\ 8 M 2 
is given by a block matrix [Xjk\ whose entries are operators in Mi. 

Let us recall that in Section 1.4 we denoted by ^Bf,(M) the real linear space of all 
Hermitian operators (i.e. sharp real observables) in M. From the result of Exercise 1.5, 
in the case of complex Hilbert spaces Mi M 2 we have 

dim23^(Jfi 8 M 2 ) = dimSB^(Jfi) • dim.'lSf l {M 2 ) 

while in the case of real Hilbert spaces Mi,M 2 the dimensionalities of spaces of 
symmetric operators are related by the inequality >'. 

I Exercise 3.4. Prove the following statement: if Sj are the density operators in 
Mj, then S\ 8 S 2 is a density operator in M\ 8 M 2 . 

Definition 3.5. Let T be an operator in M = Mi 8 M 2 . The partial trace of T (with 
respect to M 2 ) is defined as an operator Tr^f 2 T in Mi, satisfying 

miXxjm = ® e k 2 \T\t 8 4), <p, t € Ml. 

k 

If $12 is a density operator corresponding to a state of the composite system, $i = 
Tr# 2 $i 2 (and the similarly defined S 2 ) are called partial states of the first or second 
subsystem, resp. 

Partial states are the noncommutative analog of marginal probability distributions 
in probability theory. 


1 It is possible to consider quatemionic Hilbert spaces, in which case the inequality changes to <, 
see [7]. 
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I Exercise 3.6. Show that this definition does not depend on the choice of the 
orthonormal basis {e^}. If T = T\ T 2 then Ti\j( 2 (T\ ® T 2 ) = (Tr Ti)T\. 

Let X\ (8) X 2 be realized as X\ © • • • © X\, so that T = [7}*-], then Trjf 2 T = 
YfjUTjj m&Tr Xl T = [TrT jk \. 

3.1.2 Naimark’s dilation 

The fundamental relation between orthogonal and nonorthogonal resolutions of the 
identity is established by the following theorem. 

Theorem 3.7 (Naimark). Let {M x } be a resolution of the identity in X with m out¬ 
comes, dim X = d. In this case, there exist a Hilbert space X of dimensionality 
dim X < md, an isometry V : X —>• X, and an orthogonal resolution of the identity 
{E x } in X such that 

M x = V*E x V, xeX. (3.1) 

An isometry is a linear map that preserves the norms and hence the inner products 
of vectors in the Hilbert spaces. For arbitrary \<p), |^) e X, {<p\V*V\ijr) = {<p\^f) 
holds, i.e. V*V = I. The isometric embedding V allows us to identify X with the 
subspace V X of the space X and consider X C X. Then M x can be considered as 
a restriction of E x onto X : 


E x 



Proof The construction of X is done in two steps. First, we define vectors |'P) e X m 
with 


l^i) 




Wx) e X. 


( 3 . 2 ) 


I fm) 


In addition, we define the pre-innerproduct, i.e. a form satisfying all the properties of 
the inner product except nondegeneracy, by the relation 


<*'|¥) = J2^x\ M x\fx). ( 3 . 3 ) 

* 


The properties follow from the definition of the resolution of the identity. 

In order to ensure that the norm induced by this pre-inner product is non-degenerate, 
one considers the quotient space X = X m /X q, where 


X 0 = {*0 e x m : (Wo) = 0}, 
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i.e. we identify the vectors whose difference has zero norm. Now, the map V : X 
X m , defined as 

i»i 


v\t) = 


(3.4) 


1^) 


is an isometry because 


m 

W\V*V\ylr) = Y,W\Mx\f) = W\t). ( 3 - 5 ) 

X=l 

For the orthogonal resolution of the identity {E x } in X, we introduce E x |'P) = 
[0,..., \i/s x ),..., 0] T , where the only non-zero component is on the x-th place. Hence, 

(V<p\E x \Vir) = (<p\M X \1f) , <p,feX, 

which concludes the proof. □ 

We now consider an important corollary of Naimark’s Dilation Theorem, which 
provides us with a statistical interpretation of an arbitrary resolution of the identity and 
establishes that the generalized definition of a quantum observable is just an extension 
of the standard one. 

Corollary 3.8. Let {M x } be an observable in X. In this case, there exist a Hilbert 
space Xo, a unit vector fo € Xq and a sharp observable {E x } in X (8) Xq, such that 

M x = Tr x 9 (I ® Wo)Wo\)E x . (3.6) 

Proof. According to Naimark’s Theorem, M x = V*E X V, where V : X X is an 
isometric embedding. Let us identify X with the subspace V X C X. By extending, 
if necessary, the space X, we can take dim X = dim X • do, and hence 


X~X®---®X~X®Xo, 


where Xo — is the coordinate complex Hilbert space of dimensionality do, and 
X is identified with the first term in the direct sum, or with the subspace X <8 \^o), 
where |^o) = [1,0,..., 0] T . In this case, 

V®mWo\)Ex = \ M n x n I ’ 


so that (3.6) indeed holds. 


□ 
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Thus, every observable can be realized as a sharp observable in the composite sys¬ 
tem by introducing an auxiliary system in a fixed state So = |t/ f o)(V f ol- Such a way 
of realization can be called quantum randomization. 

In classical statistics, randomization, i.e. the introduction of a “roulette”, can be a 
useful trick, but it can never increase the information about the state of the observed 
system. From the result of Section 2.4.2 it follows that in quantum statistics this 
is no longer true. Paradoxically, quantum randomization allows us to extract more 
information about the state of the observed system than is contained in the sharp 
observables when we are not using an auxiliary system. 


3.1.3 Schmidt decomposition and purification 

Consider a state Si 2 in the Hilbert space of the composite system Mi (8) M 2 . 

Definition 3.9. A pure state Sj 2 is called entangled if it cannot be represented as a 
product state Si 8 S 2 . 


Thus, any unit vector 1/^12 e Mi 8 M 2 that is a nontrivial superposition of product 
vectors generates a pure entangled state. An important example is the maximally 
entangled state, generated by the vector 


1 ^ 12 ) 


Vd 


Ei 

7 = 1 


e )) 


K 2 > 


(3.7) 


1 2 

in the space Mi <8 M 2 , where d = dim Mi = dim M 2 and { ef } are orthonormal 
bases in M 1 , 2 - The meaning of this name will be clarified later in Section 7.5. Mean¬ 
while, notice that the partial states of the maximally entangled state are the chaotic 
states in M\ and M 2 '. 

Tf 1 ^12)^12 1 = y 

and similarly for M 2 - 

We shall often make use of the following result: 


Theorem 3.10 (Schmidt Decomposition). Let S 12 = |V f )( 1 AI be a pure state in the 
Hilbert space Mi 8 M 2 and Si = Tr^ 2 Si 2 , S 2 = Tr S 12 be its partial states 
In this case, the density operators S 1 , S 2 have the same non-zero eigenvalues Xj. 
Furthermore, 

j 

1 2 

where {Cj ’ } are orthonormal eigenvectors of Si, resp. ofS 2 . 


(3.8) 
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Proof. Taking the orthonormal basis {ej} of eigenvectors in Mi, we can write \f) as 

w = (3 - 9) 

j 

with some vectors \h 2 ) e M.2- Computation of the partial trace provides us with 

J 2 (hj\h 2 k }\ e l)(e}\ = = -Si, (3.10) 

j,k j 

and therefore fh 2 \h^) = A jSj^ (Kronecker’s delta). Thus, wecanput \ej) = -j==\hj) 

for A j > 0, and complete this orthonormal system to a basis in M2, consisting of 
eigenvectors of the operator S2. n 

Conversely, an arbitrary mixed state of a quantum system can be “purified” i.e. 
extended to a pure state of a larger system: 

Theorem 3.11 (Purification). Let S1 be a state in M\. In this case, there exist a 
reference Hilbert space M2 of the same dimensionality as Mi and a pure state \ f)(f\ 
in M\ ® M2 such that S 1 = Trj^ 2 |i/r)(i/r|. 

For any other pure state \ in M\ ® M2 that has this property, there is a 

unitary operator U2 in M2 such that \i)r'} = (/1 ® C/2)IV f )- 

Proof Diagonalize .S) and take \ f) as in (3.8), with an arbitrary basis {ej} in a Hil¬ 
bert space M2 isomorphic to M\. Any other \ f') has a decomposition (3.8) with a 
different basis in M2, and any two bases in the same space are related by a unitary 
transformation. □ 

Corollary 3.12. Let .S'12 be a state in M\ ® M2 such that the partial state Si is pure. 
In this case, S 12 = .S) <g> 

Proof. For a pure state Si 2 the statement obviously follows from Theorem 3.10. For 
an arbitrary state Si 2, we can apply the same argument to its purification. □ 

From Theorem 3.10, it also follows that the purification is essentially unique. Any 
two purifications with reference spaces M2, Mf such that dim M2 < dim M' 2 , are 
related by an isometric embedding of M2 into Mf 
Remarkably, there is one universal Hilbert space that contains purifications of all 
states in M. Consider the Hilbert space L 2 (M) of all operators X, Y,... in M, 
equipped with the inner product 


(X,Y) = Tr X*Y. 
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Then the linear correspondence 


\f){<p\ \f) ® {<p\, (3.11) 

where (<p \ e X* (the dual space of linear functions on X) is uniquely extended to the 
isometry between the Hilbert spaces L 2 (X) and X ® X*. Denoting by L A (resp. 
Rb) the operation of left multiplication by A (resp. of right multiplication by B), we 
have for X = \i)r}(<p\ 


LaRbX = AXB -o- \A\jr) ® (B*(p |. 

For an arbitrary density operator S in X consider the unit vector -JS W e L 2 (X), 
where W is arbitrary unitary in X. Then 

i^/SW,L A R B '/SW^ = TA \fsA\fsWBW*, 

whence, letting B = /, 


TrS)4 = (^>/SW,(A® I x *)^wy 

Therefore *J~SW is a purification of S in L 2 (X) identified with X <%) X* via the 
correspondence (3.11). Moreover, all purifications of S are obtained in this way. 

Notice that purification is a mathematical construct that does not correspond to any 
physical operation. Indeed, it would imply cloning the quantum state Si. 

Defi ni tion 3.13. A general mixed state Si 2 6 <B(X 1 ® X 2 ) is called separable (un¬ 
entangled) if it belongs to the convex hull of the set of all product states, i.e. if it has 
the form of the convex combination 


S12 — y' ftxSf ® $2 • 

X 

where S* e <&(Xj); j — 1,2 and {n x } is a finite probability distribution. The states 
that are not separable are called entangled. 

A simple necessary condition for separability is that the state S 12 has a positive 
partial transpose (PPT). An arbitrary operator in X\ ® X 2 can be represented as 
A 12 = <8> A j 2 . Its partial transpose with respect to the second system is cor¬ 

rectly defined as 

4> 2 = £4®[^i T2 < 

i 

where Tz is a transposition in the second system. In general, this condition is not 
sufficient, so there exist PPT entangled states. 
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3.2 Quantum entanglement vs “local realism” 

3.2.1 Paradox of Einstein-Podolski-Rosen and Bell’s inequalities 

Entanglement, along with complementarity, constitute the principal structural differ¬ 
ence between the classical and the quantum description of systems. 

The pure state of a classical system is described by a point in the phase space (prac¬ 
tically, this is a complete set of parameter values describing “inner properties” of the 
system). If the classical system under consideration consists of several subsystems, 
an arbitrary pure state is necessarily a product of the pure states of the subsystems. 
By fixing the values of the parameters of the system we ipso facto fix the values of the 
parameters of all its subsystems. In this case, there is no need to involve a statistical 
treatment. 

For quantum systems the situation is different. As Corollary 3.12 shows, for any 
pure entangled state S 12 of a composite system, the partial states S 1,2 are necessarily 
mixed, i.e. they require a statistical description. 

Such behavior is quite unusual from a classical point of view and, as will be shown 
in this section, is entirely incompatible with the classical mode of description (which 
is often called “realism”). 

The key example of the unusual behavior of composite quantum systems was pro¬ 
vided by Einstein, Podolski and Rosen (EPR), in 1935. In the 1950s Bohm presented 
it in a clearer form, using spin degrees of freedom instead of spatial ones, and in the 
1960s it was substantially clarified by J. S. Bell, who suggested a fundamental in¬ 
equality, which holds for classical composite systems satisfying a natural requirement 
of “locality” (separability), but is violated in quantum systems and which, in principle, 
can be verified experimentally. 

Consider two particles with spin 1/2, each of which is described by the Hilbert 
space X with dim X = 2. Initially, the particles interact, resulting in the common 
spin state described by the vector 

\f) = ^=[11) ® U) - U> ® 11)]> 

where the basis vectors 



describe the states of each particle, with the spin along the positive (resp. negative) 
direction of the z axis. Usually, one writes briefly 

\f) = -J=[| U>-Ut>]- (3.12) 

Every component denotes the state with the spins in opposite directions, while \ \[f} 
is their superposition, which cannot be represented as a tensor product of state vectors 
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related to different particles. This state is called “singlet” and is a canonical example 
of an entangled state of two quantum systems. 

Suppose that the particles travel apart along the y axis at a macroscopic distance, 
while preserving the entangled state of their spins. Consider an experiment in which 
simultaneous measurements of spin observables a (a) for one particle and a(b) for 
another particle are made simultaneously in remote laboratories A and B. Operators 
X = cr(a) ® I,Y = I ® o(b) commute. Hence, the corresponding observables are 
compatible and their covariance is given by expression (2.23). 

Exercise 3.14. By using expressions for the matrix elements 

(t |a(a)| t> =a z , 4 |<70)| t> =a x + ia y , 4 |a(2)| j) = -a z , (3.13) 

which follows from (2.14), show that in the singlet state the mean value and the 
variance of the spin observable in any direction are given by 

E ( a(a ) ® /) = 0, D (a(a) ® /) = 1 
while the covariance between the spins is given by the formula 

(x/r\a(a) <S> a(b)\^r) ——a • b. (3.14) 

It follows that for b — a, the correlation coefficient is equal to —1. Hence, there 
is a deterministic relation a = —b between the outcomes a, b of the measurements. 
The formulas (3.13) and the fact that o{a) 2 = I imply 

(f\[a(a) ® ® a(a)] 2 \f) = 0, 


whence 


[o(a) <S> I + / <S> a(a)]\\j/} = 0. (3.15) 


If the spins were classical random variables, this would mean that measuring the spin 
of the first particle in some arbitrarily chosen direction a would “instantaneously” 
bring the spin of the second particle into the opposite direction. 


Exercise 3.15. Prove the following statement: for any direction a 


\f) = ^[l^> ® I -3) - | -a) ® |a>], 


where e is inessential factor of modulus one. 


(3.16) 


Thus, one has to choose between the following alternatives: 

1) In quantum mechanics, like in the classical case, the (pure) state describes “real” 
inner properties of a system. However, in this case, in order to explain how the second 
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particle “gets to know” the chosen spin direction of the first particle, one has to accept 
instantaneous action at a distance, contradicting the physical “locality principle”. 

2) The state vector is only an expression of the information about the preparation 
procedure. In that case there is no contradiction with locality, but one has to abandon 
the complete mechanistic description of a system state as the totality of its “inner 
properties”. 

However, a thorough consideration of the EPR experiment led Bell to a much deeper 
conclusion: if one tries to describe the quantum correlations classically and simulta¬ 
neously attempts to satisfy the principle of locality, one can not achieve the level of 
correlation between systems A and B predicted by quantum theory. “Local realism” 
and the correlations (3.14) predicted by quantum theory are incompatible. 

This is established with the help of the Clauser-Horne-Shimony-Holt inequality 
which is a convenient modification of Bell’s original inequality: 

Let Xj, Y k ( j, k = 1,2 ) be arbitrary random variables on one probability space 
£2, such that \Xj\ < 1, |T^| < 1. In this case, for any probability distribution P on 
£2, the correlations of these random variables satisfy 


|E*i*i + EX1Y2 + EX 2 *t - EX 2 Y 2 \ < 2, (3.17) 

where E is the expectation corresponding to the distribution P. 

The proof is obtained by averaging the elementary inequality over the distribu¬ 
tion P 


-2 < * 1*1 + XiY 2 + * 2*1 - X 2 Y 2 < 2, 
which in turn follows from 


|*tFt + *1*2 + *2*1 - *2*2! < |*t + * 2 | + |*t - *2 1 < 2 max{|Ti|, |* 2 |}. 


Let us go back to the system of two qubits and consider the four different exper¬ 
iments in which the spin observables cr(aj), (j = 1,2) are measured over the first 
qubit, and o(b k ),(k = 1,2) in the second, while the directions dj , b k , ( j. k = 1,2) 
will be chosen later. In all four experiments the system is prepared in the singlet 
state (3.12). Assume that there exists a “local ” classical description reproducing the 
statistical results of all these four experiments. This means that there is a probabil¬ 
ity space £2, a probability distribution on £2 describing the statistical ensemble in the 
singlet state, and the random variables Xj, Y k \ j,k = 1,2, taking values ±1, which 
classically describe the spin observables a (3/), o{b k )', j,k = 1,2. In this case, the 
quantum correlations (3.14) have to satisfy the inequality (3.17). The requirement of 
“locality” (or, better to say, separability of the two subsystems) is expressed by the 
fact that the random variables that describe the spin of the first particle (*1 in the first 
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Figure 3.1. Choice of the vectors cij and b^. 


two correlations and X 2 in the other two cases) are assumed to be the same for the 
experiments with different directions of spin of the second particle (Y\ or Y 2 ), and 
viceversa for Y\ and Y 2 , which allows us to apply the inequality (3.17). Omitting this 
requirement would mean that the given classical description allows the choice of a 
measurement on the second particle to influence the inner characteristics of the first 
particle (the term “locality” is used in connection with the fact that the two particles 
are supposed to be spatially separated). 

Now, choose the spin directions as shown in Figure 3.1. Substituting the correla¬ 
tions given by formula (3.14) into the left hand side of (3.17) gives the value 2-Jl, 
which breaks the inequality. Thus, assuming the possibility of a classical “local” de¬ 
scription is incorrect. The requirement of separability appears to be so natural that it 
is not so easy to perceive. However, it is this condition which forbids instantaneous 
influence of measurements in one subsystem on another. If this condition is aban¬ 
doned, the four correlations in questions can be arbitrary numbers in [—1,1], and the 
left hand side of the inequality (3.17) can be bounded only by the value 4, which does 
not contradict quantum theory. 

Therefore, one has a dilemma: either quantum theory gives incorrect predictions for 
the correlations, or the composite system of two qubits has no classical local (separa¬ 
ble) description. Several experiments were made, starting with Aspect’s experiments 
in 1981-1982, the results of which essentially confirm the predictions of quantum 
theory. 
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3.2.2 Mermin-Peres game 

The fact that quantum entanglement between parts of a composite system is pro¬ 
foundly different from any kind of classical connection, and in some situations can 
provide a real advantage over the latter, is demonstrated in a most spectacular, nearly 
grotesque, way by “quantum pseudotelepathy games”. We will describe one of these, 
the Mermin-Peres magic square game, which uses nothing but the basic postulates of 
auantum theory. 

In the following game, a team of two players, A (Alice) and B (Bob), tries to win 
against a croupier C. Imagine a 3 x 3-matrix in which C chooses a row and a column 
and announces the number i of the row to A and the number j of the column - to B. 
A and B are allowed to agree on any joint strategy prior to this announcement, but 
cannot communicate after it has been made. Thus, A does not know the number of 
the column for B and viceversa. Then A must put +1 or — 1 in each cell of her row, 
subject to the constraint that the product of elements must be 1. Similarly B must put 
+ 1 or —1 in each cell of his row subject to the constraint that the product of elements 
must be —1. They win if their choices coincide on the intersection ij of the row and 
the column chosen by the croupier. 

In order to win, players A and B could agree a priori on some fixed 3 x 3-matrix 
with elements ±1. However, it is easy to see that there is no matrix that satisfies 
the imposed constraints: from the constraint on A (resp. B) the product of all matrix 
elements should be equal to 1 (resp. —1). They could adopt a randomized strategy, 
but the probability of success will always be strictly less than 1. 

However, assuming that they are capable of producing and sharing an entangled 
quantum state before C makes his announcement, and make a pre-agreed quantum 
measurements on their respective parts of this state after the announcement, they will 
always be able to win! Of course, both the entangled state and the measurements are 
independent of the choices of C, as it should be. Hence, the rules of the game are 
perfectly respected and it is the use of advanced quantum information technology that 
produces a result that is inachievable by solely classical means. From the point of 
view of a purely classical observer it looks like there is some immaterial connection 
between A and B, hence the name “pseudotelepathy”. 

For a pair of qubits A and B, let us introduce the Bell state 

l*>4B - -^=(1 tt) + I W» 

which is similar to the singlet state (3.12). 

The prepared entangled state is the tensor product of two Bell states of two pairs of 
qubits A\B\ and 4 2#2 : 

I*} = I'J'UiBi ® mA 2 B 2 - 

Before C makes his announcement, qubits A 1 and A 2 are distributed to player A, 
while qubits B 1 and B 2 are given to B. Players A and B also agree in advance that, 



46 


I Basic structures 


as soon as they receive the numbers of the row i and the column j , respectively, they 
will perform the spin measurements on their pairs of qubits, according to the following 
table 


CT 0 & O7 O 7 C*y CT 0 Oz & O 7 

Ox <8> (To CTo <g> o x a x ® Ox 

_-O x ® Oz -Oz %Ox CTy <g> CTy_ 

and write the outcome of the measurement in the corresponding cell. At this point, 
we should expand on the special properties of this array of operators. 

Denoting by Xij the operator on the intersection of i -th row and j -th column, one 
has 

i. Xij = X* and Xfj = cto ® op = /, so that all Xij are (sharp) real observables 
with eigenvalues ± 1 ; 

ii. in any row i , the operators Xij ; j — 1,2,3 commute. Hence, the observables are 
compatible. Moreover, XnX^X ^3 = I. Hence for any value i = 1,2,3 given 
by C player A can make a joint measurement of the observables Xij ; j = 1,2,3, 
obtaining the outcomes +1 or —1, which obey the constraint for A. Next, A 
writes these outcomes in row i. A similar description applies to player B and any 
column j. 

iii. miraculously, the numbers written on the intersection of i -th row and j -th column 
by A and B always coincide. This follows from the identity 

= H/); i,j = 1,2,3, 

where X-j (resp. X?) is the operator Xij in system A — A 1 A 2 (resp. B 
II 1 H >) Indeed, this identity means that, if the whole system A 1 A 2 B 1 B 2 is pre¬ 
pared in the state |vP), the product of the outcomes of the (compatible) measure¬ 
ments of A and B, in any cell ij , will be equal to 1, i.e. the outcomes coincide. 

Exercise 3.16. Prove the properties i., ii. by using the multiplication table (2.17) 
for the Pauli matrices. Prove the property iii. by using the identities 

(ctjc.z ® ct 0 )|vP )^ 5 = (ct 0 ®o x ,z)\V)ab; 

(o y ® oq)\^) ab = ( ct 0 < 8 > Oy)\^) A B 


similar to (2.17). 
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3.3 Quantum systems as information carriers 

3.3.1 Transmission of classical information 

If the information carrier is a classical system with a finite number of states d, the 
maximal number of binary digits, or bits, which can be recorded or transmitted by 
such a carrier is equal to log 2 d . If the transmission is without errors, one speaks of 
an ideal communication channel. In general, a noisy classical channel is described 
by the conditional probabilities p(y \x) for receiving the signal y at the output, if the 
value at the input was x. For the ideal channel p(y |x) = 8 xy . The opposite case is the 
channel for which the probabilities p(y\x) = p(y) do not depend on the transmitted 
signal x and hence the information is not transmitted at all. More detailed account of 
classical channels will be given in Ch. 4. 

Let us now consider how classical information can be transmitted with a quantum 
carrier described by a Hilbert space X. The carrier’s state S is prepared with macro¬ 
scopic devices. By changing the parameters of a device, one changes the parameters 
of the state, thus having a possibility to “record” classical information into the quan¬ 
tum state. Let there be n different signals and S x be the quantum state corresponding 
to the signal x = 1,...,«. The mapping x —> S x describes the net result of a physical 
process that generates the state S x . A detailed description of such a process is beyond 
the scope of information theory, for which only the resulting states S x are relevant. 

To extract the classical information contained in the output state of a channel, one 
has to perform a measurement. If one measures the observable M — { M y }, the 
conditional probability of obtaining a signal y at the output given the input signal x is 
equal to 

PM(y\x) = Tr S x M y . (3.18) 

Thus, for a fixed quantum measurement, we have the usual classical channel. Assume 
that one manages to prepare states and perform a measurement in such a way that 
S x — \ex)(e x \, M y — \e y )(e y \, then Pm iy\x) = 8 xy , i.e. one has an ideal classical 
channel, capable of transmitting log 2 d bits, where d = dim X. Later, in Chapter 5, 
we will establish that this quantity is the upper bound for the amount of classical 
information that can in principle be transmitted with the given quantum carrier. This 
implies the following important consequences: 

1) The fact that Hilbert space contains infinitely many different state vectors does not 
aid us in transmitting unlimited amount of information. The more states are used for 
transmission, the closer they are to each other and hence they become less and less 
distinguishable. 

2) The dimensionality of the Hilbert space is a measure of the ultimate information 
resource of the corresponding quantum system. 
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3.3.2 Entanglement and local operations 

In this section, we will need elementary knowledge about quantum evolutions. Later, 
in Ch. 6, we will provide a detailed discussion of this question from the viewpoint of 
the general theory. Meanwhile, it is sufficient to know that reversible evolutions of a 
quantum system are described by unitary operators U . The vector of the initial pure 
state \j/ is transformed into U \f/. Correspondingly, a density operator S is transformed 
into USU*. 

Consider now the following question. The nonlocal, from a classical viewpoint, 
nature of the EPR-correlations suggests to attempt using them for instantaneous in¬ 
formation transmission. Let us show that this is not possible within the quantum me¬ 
chanical framework, so that the EPR correlations do not contradict locality. Consider 
two quantum systems A and B, in the corresponding spaces Xa and Xb, which are 
in an entangled state Sab ■ In the ease of practical interest, the systems are sufficiently 
spatially separated, although this is not reflected by the formulas. The system A (trans¬ 
mitter) uses the classical messages x, which should be transmitted to B (receiver), in 
order to perform certain unitary operations U x in the space Xa (such operations are 
called local, i.e. acting nontrivially only in one of the two systems). This brings the 
state of the system AB into S x = (U x ® Ib)Sab(U x ® /#)*, and in this way the 
classical information is written into the quantum state of the composite system. In this 
case, observer B makes a local measurement of an observable in Mb, corresponding 
to the resolution of the identity M = {I a <8> M y ) in the space of the composite system 
Xa <%) Xb- The resulting conditional probability (3.18) is 

PM (y\x) = TA(U X ® Ib)Sab(U x ® Ib)*(Ia ® M y ) = Tr Sb M y , 

where Sb = Tr^.S^a is the partial state of the system B, which does not depend on 
x. This means that local operations of observer A do not in any way influence the 
state of B, and hence, no information is transmitted. 

Consider now the different scenario where observer A makes a local measurement 
of observable and sends the outcome to observer B, who takes this into account 
by selecting the subensemble where this outcome x has appeared. In this case, the 
posterior state Sb (x) of the party B will depend on the outcome of the measurement 
x. To find the posterior state, assume that B performs his local measurement M y , 
and consider the joint probability distribution of the two measurements 

Px, y =Tr S A B(M?®M y B ). 

By introducing the conditional probabilities p(y\x) = p x , y /p x (assuming p x > 0), 
we have 


Px = T \S A b{M* ® I B ) = TASaM?, 

P(y\x) = Tr S B (x)Mf*, 
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where 


Sb(x) = Px'TrX a S ab(Mx < 8 >/fi) 

are the posterior states. 

The change of the state of observer B from Sb to Sb (x) means a reduction of the 
full statistical ensemble of B to the subensemble characterized by the value x of the 
outcome of the measurement of observer A. If this outcome is not taken into account 
(for example, A does not send it to B, hence B makes no selection), then the state of 
B is not changed 

Sb = ^ 2px s b(x )• 

* 


Exercise 3.17. Let Sab = U / )ii / \ be a pure entangled state, 

\f) = ® )• 

* 

where e£ is an orthonormal basis in Ma, Ylx ll’Ajf II 2 = 1- Assume that A mea¬ 
sures the observable — \ e x)( e x \ anc * sends the measurement outcome x to 
the party B. Prove that p x — |||| 2 and the posterior states of B are 

Sb(x) = Px'\fx){fx\- (3-19) 

Coming back to the EPR experiment, we obtain from the relation (3.16), that in 
the case where A measures the spin observable a (a) and sends the outcome ±1 to 
observer B, the posterior state of B (i.e. the state of the corresponding statistical 
subensemble) is | Ta)(T a \. On the other hand, if there is no communication between 
A and B, the state of B remains chaotic, independently of the measurement of A. 
Thus, quantum theory does not contradict the principle of locality. 

3.3.3 Superdense coding 

Although EPR-correlations cannot aid in transmitting information from A to B, it 
appears that using these correlations as additional resource allows one to double the 
amount of classical information transmitted from A to B if they are connected by an 
ideal quantum channel enabling perfect transmission of a quantum state from A to B. 
Hence, EPR correlations appear as a “catalyst” for classical information transmission 
and from this viewpoint can be considered a unit of a specific information resource 
(sometime called an ebit). 

Consider the two systems A and B, each of which is a qubit, connected by an 
ideal quantum channel. The maximum amount of classical information that can be 
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transmitted from A to B is equal to 1 bit, and is obtained by encoding this bit into two 
orthogonal vectors, say, 


o-HO), i-Hi>, 

where | 0 ), 11 ) is the canonical basis for one qubit. 

The protocol of “superdense coding”, which allows us to double the amount of 
transmitted classical information, is based on the following simple mathematical fact: 
all vectors of the Bell basis 

\ e +) = 1 00) + 111), \e-) = |00) - |11), |ft+) = |10> + |01), \h_) = |10)-|01) 

in the system of two qubits AB (we will systematically omit the normalizing factor 
1 / «Jl) can be obtained from one of these vectors by the action of “local” unitary 
operators, i.e. operators acting nontrivially only on qubit A. Namely, 

\e_) = (a z 0 I)\e+), | h + ) = (a x 0 I)\e+), \h~) = -i(a y 0 I)\e+). 

Thus, if AB is initially in the state \e+), A can encode 2 bits of classical information 
into 4 states of the Bell basis by mere local operations a Y ; y = 0, x, y, z, and then 
(physically) send its qubit to B via the ideal quantum channel. Then, making the 
measurement in the Bell basis, B receives 2 bits of classical information. 

3.3.4 Quantum teleportation 

So far, we considered transmission of classical information. This information can be 
“written” into a quantum state and transmitted through a physical channel. However, 
the quantum state itself is an information resource, since it carries statistical uncer¬ 
tainty. The information carried by an unknown quantum state is qualitatively different 
from the classical one, and deserves the special name quantum information. The most 
apparent distinction of quantum information is impossibility of copying (no cloning). 
While the classical information can be reproduced in an arbitrary number of copies, 
there is no room for a physical device that could perform the same task for quantum 
information. Indeed, the cloning transformation 

\f) -► \ f) <0 ••• < 8 > \ f) 

'-v-' 

n 

is nonlinear, and hence cannot be implemented by a unitary operator. Of course, any 
fixed state (or even a collection of orthogonal states) may be copied by a specific 
device, but there is no universal device which would do this for an arbitrary state. 

Let us briefly discuss the question of how a quantum state can be transmitted. 
A straightforward way is to just send the quantum system itself. Much more inter¬ 
esting and nontrivial is teleportation of the state, where the system itself is not trans¬ 
mitted, but instead a certain classical message is sent. An essential additional resource 
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are the EPR-correlations between the input and the output, which again play the role 
of “catalyst”. Note that teleportation of a quantum state cannot be accomplished by 
merely sending classical information without using the entanglement. Since the clas¬ 
sical information can be copied, this would mean the possibility of cloning quantum 
information. 

Let there be two quantum systems A and B, describing input and output of the 
communication channel, respectively. In the simplest and basic version, the systems 
A and B are just qubits. 

i. Before the transmission, the system AB is prepared in the state |00) + |11). 
(Recall that we omit the normalizing factor 1 / -Jl). 

ii. The third party C provides A with an arbitrary pure state 

I f) = «|0) +6|1>, 

which should be transmitted to B. Now, the combination of the three systems 
CAB is in the state 

(fl|0) +fell)) ®(|00) +111)). 

iii. Then observer A performs a certain reversible local transformation of the state of 
CA 

iv. Observer A performs a measurement in CA with 4 outcomes (comprising 2 bits 
of classical information) and transmits the outcome of its measurement to B by 
using the ideal classical communication channel. The transformation and the 
measurement will be described below. 

v. Depending on the outcome received, observer B performs a certain transforma¬ 
tion of its state, obtaining the required state vector \\j/). 

The necessary transformations are examples of quantum “gates”, used in quantum 
computations. In step iii. the state of CA is transformed by the operation CNOT 
(controlled “not”): 

|00) —>■ |00), |01)-H01), 110) -»• 111), 111) -»■ 110), 

when the state of the first qubit is unchanged, whilst the state of the second qubit is 
changed to its opposite if and only if the state of the first qubit is 11). As this transfor¬ 
mation turns one basis into another basis, it is a unitary operator in the 4-dimensional 
space of CA. Now, the qubit C is transformed by the Hadamard gate H, implemented 
by the unitary matrix 


i n 




V2L1 


1 

-1 
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Then the basis is rotated by the angle n/4: 

|0)->-J=(|0) + |i)), |i)->-L(|0)-|i)). 

The initial state of the total system CAB is 

a 1000) + fe|100) + a|011) + fe|lll). 

The action of CNOT on CA results in 

a 1000) + fe|110) + a|011) + fe|101). 

Then the Hadamard gate H acts on C, and we obtain 

a(|000) + |100)) + fe(|010) - 1110)) + a(|011) + |111)) + fe(|001) - |101». 

Singling out the state of the subsystem CA we have 

|00)(o|0) + fe|l» + |01)(o|l) + fe|0)) + |10)(a|0) -fe|l» + |ll)(a|l) -fe|0)). 

Now, in step iv., the measurement in the subsystem CA is performed, which projects 
onto one of four basis vectors |00), |01), 110), 111). According to formula (3.19), the 
posterior state of the system B, depending on the obtained measurement outcome, is 
described by one of the vectors 

a|0) + fe|l), a|l)+fe|0), a|0)-fe|l), a|l) — fe|0). 

The outcome of the measurement is sent to B through the ideal classical channel. 
In step v, depending on the outcome, B applies one of the unitary Pauli operators 
a 0 = I,a x ,a z ,(Ty, in each case transforming the state of B into a|0) + b\\). Thus, 
observer B is transformed to the state in which observer C was initially, while the 
state of C is irreversibly lost (otherwise cloning of the quantum information would be 
possible). 

3.4 Notes and references 

1. For more detail about tensor product of Hilbert spaces, see Ch. II.4 of the 
book [171], where one can also find solutions to the Exercises in this section. For 
a general formulation of Naimark’s Dilation Theorem, see [159], where it was proved 
for an arbitrary resolution of the identity (probability operator-valued measure, see 
Definition 11.29) in infinite dimensional Hilbert space. The motivation was to obtain 
self-adjoint extensions of symmetric operators but it was later recognized that this was 
just the first discovery of the remarkable general phenomenon that the properties of 
various operator objects can be improved by extension to a wider underlying space. In 
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Quantum Information Theory this principle received the colorful name “the Church 
of Larger Hilbert Space”. The purification of quantum states and the unitary dilation 
of irreversible quantum dynamics in Section 6.3 are two of its other manifestations. 

An analog of the decomposition (3.8) in separable Hilbert space dates back to 
Schmidt’s paper of 1907, which is devoted to integral equations. The “purification 
map” was introduced by Powers and Stormer [169] and studied by Woronowicz [229] 
in connection with the problem of the quasi-equivalence of states on a C* -algebra. 
Tne trick of purifying mixed states, which is widely used in quantum information 
theory, also appeared in the paper of Lindblad [149]. 

The observation that the PPT condition is necessary for separability is due 
Peres [166] and Horodeckis [115], while the latter had shown that this condition is 
also sufficient for separability of states in 2 x 2- and 2 x 3-dimensional composite 
systems. In higher dimensions there exist PPT states that are not separable, but their 
entanglement (called bound entanglement ) is “weak”, since it doesn’t allow for distil¬ 
lation, i.e. transformation to maximally entangled Bell states by using local operations 
and classical communication (LOCC) (P. Horodecki [116], Horodeckis [119]). 

2. The term “entanglement” (German “Verschranktheit”) was introduced by Schro- 
dinger, who was the first to notice the unusual properties of entangled states. The 
careful logical analysis of the EPR experiment is due to Bell [18], who in particular 
established an inequality relative to (3.17) and applied it to analyze the EPR paradox. 
In the discussion of the Mermin-Peres game, we followed Aravind [8], where one can 
find further references. Experimental demonstrations of the quantum gain were also 
reported, see Zu et al. [231], 

3. The protocol of superdense coding was proposed by Bennett and Wiesner [26]. The 
protocol of quantum teleportation was proposed in the famous publication of Bennett, 
Brassard, Crepeau, Jozsa, Peres and Wootters [19]. The possibility of teleportation 
of the polarization state of a photon was experimentally demonstrated by the group 
of Zeilinger in 1997. Since then, a number of further successful experiments were 
performed, both on photons and on atoms. 
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The primary coding theorems 




Chapter 4 

Classical entropy and information 


4.1 Entropy of a random variable and data compression 

Let X be a random variable with values in a finite set X and let P = {p x \ x e X} be 
its probability distribution. The entropy H(X ) = H(P) is defined as 

H(X) = J2 niPx ), (4.1) 

x€X 


where 


rj(0 = 


-t log t, 

0 , 


t > 0, 

t = o, 


(4.2) 


Here and in what follows log = log 2 denotes the binary logarithm. The entropy 
H(X ) is a measure of uncertainty, of variability, or of the information content in the 
random variable X, as we shall see below. Without loss of generality we will assume 
in what follows that p x > 0 for all x. 

Let d be the number of elements in the set X. One has 


0 < H(P) < log d. 


(4.3) 


with the minimum value attained on degenerate distributions {1,0,..., 0},..., 
{0,..., 0,1}, and the maximum value on the uniform distribution {j ,..., ^}. The 
first inequality follows from the non-negativity of the function 77 (t) in the domain 
[ 0 , 1 ], and the second from concavity of the function t -»• log t: 


d 1 d i 

H(P) = V p x log— < log Y] Px — = log d 

*=i P* ^ P* 


X=1 


Let us now explain the operational interpretation of the entropy H(X) as a mea¬ 
sure of the information content in the random variable X. Consider a “random source” 
which produces independent, identically distributed (i.i.d.) random variables with dis¬ 
tribution P. A sequence w = (x\,..., x n ) of n letters from the alphabet X is called 
a word of length n. The total number of such words is |X|" = 2" log Hence 
it is possible to encode all these words using binary strings of length n log |X|, i.e. 
n log |X| bits 1 . However, by using the fact that the probability distribution P of X 


1 To avoid inessential complications, we shall systematically neglect the fact that this number is not 
necessarily an integer. 
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is in general non-uniform, one can in general achieve a much better encoding. The 
possibility of the data compression is closely connected with the Asymptotic Equipar- 
tition Property. 

Theorem 4.1. If X \..... X n are i.i.d. random variables with distribution P, then 

log pxi —> H(X ) in probability. (4.4) 

This means that for any 5, e > 0 there exists an n o, such that for all n > no we 
have 

P {|-“^ lo S^' <<$} >l-e. (4.5) 

This is a direct consequence of the Law of Large Numbers applied to the i.i.d. random 
variables — log px t \i = 1,...,«, taking into account that E(— log px t ) = H(X). 
Noting that the probability of the occurrence of a word w = (*],..., x n ) is 

Pw=p Xl --- Px n = log ^') (4.6) 

we can now use relation (4.5) to introduce a basic notion of the typical word: 
Definition 4.2. A word w, having probability p w , is called 8-typical if 

2-nmX)+S) < Pw < 2 -n(mx)-8) f (4 . 7) 

in other words, if it satisfies the event in the formula (4.5). 

The collection of all 5-typical words of length n will be denoted by T n,s . By 
using (4.5), (4.7), it is straightforward to see that 5-typical words have the following 
properties (where 5, c are fixed positive numbers): 

Exercise 4.3. Prove the following statements: 

i. There are at most 2 n ^ H( - x ^ +s ^ typical words, i.e. \ T n,& \ < 2 n( - H( - x ^ +s \ 

ii. For sufficiently large n, the set of all non-typical words has a probability 
P ( T n > s ) < 6. 

iii. For sufficiently large n, there are at least (1 — typical words. 

Now, one can do efficient data compression by using all binary sequences of length 
n(H(X) + 5) to encode all 5-typical words, and all non-typical words into one addi¬ 
tional (idle) symbol. In this case, the error probability of such a coding scheme will 
be less than or equal to c. 
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Conversely, any code that uses only binary sequences of length n (H(X) — 8) has an 
asymptotically non-vanishing error probability, which in fact can be made arbitrarily 
close to 1 for n large enough. Indeed, let C be the collection of words that are used 
for the encoding into n(H(X ) — 5) binary strings, while all other words are encoded 
into the idle symbol. In this case, the error probability is 1 — P(C), where 

P(C) = P(C n T n ’ s/2 ) + P(C n T n ’ 8 / 2 ) (4.8) 

< + p(yn,5/2) (49) 

<2~ n8/2 + €, (4.10) 

which can be made arbitrarily small for n large enough. 

Since we need asymptotically N ~ 2 nH(X ' ) codewords for efficient encoding, 
H(X ) can be interpreted as a measure of the information content of the source in 
bits per letter. It is also clear that if we start with a uniform distribution p x = jjq, 
H(X) = log | X | and no data compression is possible. 

Exercise 4.4. Let X be a finite subset of R and let X \,..., X n ,... be i.i.d. 
random variables with values in X. Assuming the conditions of Theorem 4.1, 
prove the following large deviations inequality: 

> <$j < exp j— n sup [s<5 — fi(s)] J , (4.11) 

where 


fi(s) = In > = o(j) for s -»• 0. 

This implies the Law of Large Numbers with exponential decay of tails, since 
sS — iu(s) > 0 for small s > 0. 

Hint: prove and apply the following version of the Markov inequality 
P{* >x} = P{e ,x > e**} < exp (-jjc) Ee sX 
for any x € R and s > 0. 

4.2 Conditional entropy and the Shannon information 

Let X, Y be random variables on one probability space with joint probability dis¬ 
tribution {px,y}- In this case, one can define their joint entropy H(XY ) similar to 
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relation (4.1). The conditional entropy H(Y\X) is defined as 
H{Y\X) = Y j PxH{Y\X = x) 

X 

= -J2p x Y1 pen*) log p(y 

x y 

= - X! P*>y log P*>y + X! P x log P x 

x,y x 

= H(XY)-H(X). (4.12) 

There is a useful general rule stating that any linear relation for (conditional) entropies 
continues to hold if, in every term, a condition is supplemented with one and the same 
extra condition. For example, (4.12) implies 

H(XY\Z) = H(X\Z) + H(Y\XZ). (4.13) 

A noisy communication channel is described by the transition probability p(y\x) 
from the input alphabet X to the output alphabet V, i.e. a collection of conditional 
probabilities that a letter y e V is received when a letter x e X is sent. An input 
probability distribution P = {p x \ is transformed by the channel to the output proba¬ 
bility distribution P' — {p' y }, where p' y = J^x P(y\ x )Px- The Shannon information 
(about the random variable X contained in Y) is defined as 

I(X;Y) = H(Y)-H(Y\X), (4.14) 

where the entropy H(Y ) can be interpreted as the information content of the output, 
while the conditional entropy H(Y\X) is interpreted as its useless component that 
arises from the noise in the channel. Substituting (4.12) into this expression for the 
Shannon information, we see that it is symmetric in X and Y, and hence it can also 
be called the mutual information 


I(X; F) = H(X) + H(Y) — H(XY). (4.15) 


Furthermore, I(X; Y ) = H(X) — H(X\Y ), where H(X) can now be interpreted as 
the information content of the input, and H(X\Y) is sometimes referred to as loss. 
An explicit expression for the Shannon information via the input distribution and the 
channel transition probabilities is given by 


I{X-,Y) 


EpjpO’I*) l°g 

xy 


( p(y \ x ) A 
\L X > p(y\ x ')px’) 


(4.16) 


A very useful quantity from classical statistics is the relative entropy of two proba¬ 
bility distributions P = {p x }, Q = {<jx\ (the Kullback-Leibler-Sanov entropy): 


H(P; Q ) = 


T,x-.p x > 0 Px log if {x:p x >0}Q{x:q x > 0}; 

+oo otherwise. 
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By using the inequality In t < t — 1, we obtain 


H(P;Q)>- loge Y, P* 

x:p x >0 


(*-') 


-1 > o. 


(4.17) 


with equality if and only if P = Q. 

The quantity H(P; Q) plays an important role as an “asymmetric distance” 
between the probability distributions. It has the following monotonicity property: 
consider the channel r(y\x) and two input probability distributions P = { p x }, 
Q = {<?*}, which are transformed to the output distributions P' = {p y }, Q' = {q ' y }, 

p' y = q, y = 'H r (y\ x ^x- 

X X 


In this case. 


In fact. 


H(P'- Q') < H(P ; Q). 


(4.18) 


H(P; Q) = Yp x r(y Wiog 
xy 


r(y\x)p x 

r(y\x)q x 


= J2p* 


,y 


Py 

log -r + log 
q'y 


p(x\y) \ 
q{x\y) ) 


= H(P f ; <2') + Y,Py H (P( x \yy’ q ( x \y» 
y 

> H{P'\ Q'). 


Exercise 4.5. Prove that I(X\ Y) = H(p x>y ; p x ■ p y ). Hence, due to (4.17), 
I(X ; Y) > 0, while I(X ; T) = 0 if and only if X and Y are independent random 
variables: p x>y = p x ■ p y . 

This also implies monotonicity of the conditional entropy 

0 < H(Y\X) < H(Y), 
and sub-additivity of the entropy, 


H(XY) < H(X) + H(Y). 


(4.19) 
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Exercise 4.6. Consider the concatenation of two channels p(y\x ) and q(z\y), 
defined as r(z |x) = c l( z \y)p(y\ x )- Let X be a random variable at the input 
of the channel p(y\x) having the probability distribution { p(x)\, and denote by Y 
and Z, the output of the channel p(y |x) and r(z\x), resp., so that X ->■ Y ->■ Z 
is a Markov chain. In this case, the data-processing inequalities hold, 

I(X\Z ) < min{/(Z; F), /(F; Z)}. (4.20) 

Hint: I(X;Z) < I{X\YZ) = I(X; Y) follows from H(X\Z) > H(X\YZ) = 
H{X\Y) and analogously for the second quantity. 

The Shannon information has the following expression, in terms of the relative 
entropy 

I(X-,Y) = J2px h (K-,P'), (4.21) 

where P' x = {p(y\x)\ for fixed x e X, and P' = Yl x PxP'% * s the distribution of Y. 

4.3 The Shannon capacity of the classical noisy channel 

Consider the classical channel X -»■ Y determined by the transition probability 
p(y |x). The most important characteristic of the channel is the Shannon capacity 

Cshan = ma x/(Z;F), (4.22) 

where the maximum is taken over all possible distributions P — {p x } of the input 
X. It will later be identified as the operationally defined information transmission 
capacity of the channel. 

As an example, let us consider the binary symmetric channel. Assume that X and 
V consist of two letters 0,1, which are transmitted with probability p and flipped with 
probability 1 — p. Introducing the binary entropy 

h 2 (p ) = ~p log p - (1 - p) log(l - p), (4.23) 

the mutual information can be written as I(X;Y) = H(Y) — H(Y\X) = 
H(Y) — h 2 (p), and is thus bounded by the quantity 

Cshan = 1 -h 2 (p), (4.24) 

which is achieved for the uniform input distribution: po = P\ = \- 

In general, I(X,Y) is a concave function of the input probability distribution 
P = {p x }. Hence, the Kuhn-Tucker conditions from convex analysis, see e.g. [42], 
can be applied to characterize the optimal input distribution P that maximizes I{X\Y). 
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Exercise 4.7. Show that the entropy H(P) is a concave function of the probabil¬ 
ity distribution P. Hint: use concavity of the function t -»■ t]{t). Therefore, the 
output entropy H(Y) — H(P') is also concave, while the conditional entropy 
H(Y\X) is affine. Use formula (4.14) to conclude that I(X ; T) is concave. 

Exercise 4.8. ( Kuhn-Tucker conditions ) Let F(P) be a concave function of 
probability distribution P , continuously differentiable at the interior of the sim¬ 
plex of all probability distributions on X, and having possibly infinite limits of 
the partial derivatives at its boundary. Prove the following statement: a necessary 
and sufficient condition for the point P° to be the maximizer for F(P) is that 
there exists a A such that 


dF(P°) | = A, if p° x > 0; 
dp x (< A, if p° x = 0. 


In the case of Shannon information (4.21), we find 

= fl(P';P')-loge, 

which leads to the following “maximal distance” characterization of the optimal input 
distribution P° = { p x }: there exists a p such that 


H(P^{P 0 )') 


= IF 


if p° x > 0; 
if pl= 0, 


(4.25) 


in which case, by necessity p — Cshan- 

Assume that two channels { Pj ( yj I *j )} \j — 1,2, are given, and consider the 
composite parallel channel consisting of the independent use of these two channels, 
described by the transition probability {pi (y i |xi)/? 2 (JV 21 -^ 2 )}- 


Proposition 4.9. The Shannon capacity of the composite channel is 

(Cshan) 12 = (Cshan) 1 4” (Cshan) 2 • (4-26) 

Thus the Shannon capacity is additive for the parallel channels. 

Proof. Let P 7 be the optimal input probability distributions for the y-th channel. In 
this case, they satisfy the condition (4.25) with constants pj — (Cshan)y- Using the 
fact that in general 

H (P l x P 2 - Q 1 x Q 2 ) = H (P 1 ; Q 1 ) + H (P 2 ; Q 2 ) , 

we obtain that the probability distribution P = P 1 x P 2 satisfies the condition (4.25) 
for the composite channel, with the constant p = (Cshan)i + (Cshan) 2 - □ 
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4.4 The channel coding theorem 

Given the channel p(y\x), one can consider the composite memoryless channel 

p(y n I*") = piyi 1*0 • • • p(yn\x n ), (4.27) 

which transmits words of length n, by using the initial channel n times, with each use 
being independent of the others, 



Xi - 

^ y 1 

x n - 

x 2 - 

y 2 


X n 

■4- yn 


By X n , Y n , we denote the random variables at the input and the output of the com¬ 
posite channel. 

Following (4.22), consider the quantity 

C n = ma x.I(X n \Y n ) 

x n 

for the composite channel. Inductively applying property (4.26), we find that the 
sequence {C„} is additive for memoryless channels, whence 

C„ = n Cshan • (4.28) 

To reduce the effect of noise, one uses encoding of the messages at the input and, 
correspondingly, decoding at the output. The whole process of information transmis¬ 
sion is given by the diagram: 

encoding n channel n decoding _ 

l > X > y > J, (4 .Zy) 

where i (resp. j) denote transmitted (resp. received) messages. One can simply as¬ 
sume that i,j e {1,..., N} are the indices of the messages, since only the quantity 
N of the transmitted messages is of importance. 

The aim is to choose an encoding and decoding that maximizes the transmission 
rate R = U>g n N , equal to number of bits per transmitted symbol, while keeping the 
error probability small. Let us proceed to the exact formulation. 

Definition 4.10. A code (W, V) of size N for the classical memoryless channel con¬ 
sists of a codebook W, which is a collection of N codewords ..., of length 
n, and a decision rule that is a decomposition of V n into N + 1 nonintersecting sub¬ 
sets V = {K®, K^ 1 ),..., V^}. The subsets ..., can be interpreted as 
decision domains. Whenever y n e is received, we decide that the word to^ 
was transmitted. If y n e = uj^ =1 VV\ no definite decision is made. 
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Thus, the maximal error probability of such a code is 

p e (W,V)= max (l - p(V a) \w U) )), (4.30) 

1 <j<N ' ' 

where p(V u) \w^) = P (Y n e V^\X n = w^">) - are the probabilities of correct 
decisions. The mean error probability is equal to 

_ , N 

P e (W, K) = — (l - P(^ (7 V a) )) < PeiW, V). (4.31) 

j =i 

The following useful statement implies that criteria for information transmission based 
on the average error probability P e (W,V) and on the maximal error probability 
P e {W, V) are asymptotically the same. 

Lemma 4.11. For any code (W, V) of size 2N with average error probability 
P e {W, V) there exists a subcode (W, V ) of size N which has a maximal error prob¬ 
ability P e (W , V) < 2P e (W, V). 

Proof. Put e = P e (W. V) and assume that there are at least N + 1 codewords with 
error probability p(V^\w^) > 2e, so that it would not be possible to construct the 
required N -code. In this case, the average error probability of the 2/V-code would be 
bounded from below, since 

P e (W,V)> ^2e(N + l)>€, 

which contradicts the assumption. □ 

Denote by p e (n, N ) (resp. by p e (n , N )) the maximal (resp. average) error proba¬ 
bility minimized over all codes of length n and size N. Then 

^p e (n,N) < p e (n,2N) < p e (n,2N). (4.32) 


Definition 4.12. The number R > 0 is called the achievable transmission rate if 

lim p e (n, 2 nR ) — 0. 

n-t-oo 

Hie supremum of all achievable rates is called the information capacity C of the 
channel j?(>ix). 

Theorem 4.13 (Shannon’s Channel Coding Theorem). For a memoryless channel 

C = Cshan' 
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Thus, the operationally (and asymptotically) defined transmission capacity is equal 
to the “one-letter” characteristic of the channel as defined by (4.22). The inequality 
C > Cshan will follow from the statement 

lim p e (n,2 nR ) = 0, if R < C s h an , 

n-*oo 

which is called the direct statement of the coding theorem, while the weak converse 
amounts to 

lim inf p e (n, l nR ) > 0, if R> Cshan, (4.33) 

n—yoo 

which implies C < Cshan- In fact, one can prove a stronger result 

lim p e (n, 2 nR ) = 1, if R > C s han- 
n->o o 

Proof. The Weak Converse. Due to inequality (4.32), it is sufficient to prove the 
analog of the statement (4.33) for the average error probability ~p e (n,N). 


Lemma 4.14. (Fano’s Inequality) Let X, Y be two random variables and X = X(Y) 
be an estimator of X with error probability p e — P(X(Y) f X). In this case. 


H(X\Y) < h 2 (p e ) + p e log(|X| - 1) < 1 + p e log|X|. 
Proof. Let E denote an indicator of the estimation error, 

[0, if X(Y) = X\ 


E = 


1, otherwise. 


(4.34) 


(4.35) 


Since £ is a function of (X, X ) and thus is fixed for a given value of X, Y, we have 
H(X\Y) = H(E,X\Y). But (4.13) implies 

H(EX\Y) = H(E\Y) + H(X\EY). (4.36) 

Here H(E\Y) < H(E ) = h 2 (p e ) and 

H(X\EY) = (1 - p e )H(X\E = 0, Y) + p e H(X\E = l,Y) < Pe log(|X| - 1), 

where we have used that H(X\E = 0, K) = 0, since E = 0 means that, given Y, we 
know X exactly. The condition E — 1 is the same asl ^ X(Y), which leaves the 
possibility for | X \ — 1 values of X. Hence, the maximal possible value of the entropy 
islogflXI-l). □ 

Consider an arbitrary code {W, V) of size N, with codewords u/ 1 \ ..., of 
length n and a decomposition of V n into /V + 1 decision domains 


y( 0 ), V ^,..., C V n . 
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Denote by X n a random variable taking the values w^\..., uf N ^ with equal prob¬ 
abilities jj and let X n = X(Y n ) be an estimate for X n such that X n = if 
Y n e V^\ In this case, 


C n > I(X n ; Y n ) = H(X n ) ~ H(X n \Y n ) 

> logA(l - P{i" + X n }) - 1, (4.37) 


Ly Fano’s inequality, where P{X n ^ X n } = P e (W, V). Substituting N = 2 nR and 
dividing by nR leads to 


P e (W,V)>\-%-\. (4.38) 

nR nR 

Taking the minimum of all possible codes, and using the additivity (4.28), we have in 
the limit n ->■ oo 


liminf p e (n, 2 nR ) > 1 — - ^ Shan > 0 
n-Hx J ~ R 

for R > C (shall ■ 

Proof of the Direct Statement. The main idea, due to Shannon, is to use random 
coding. Take N independent codewords u/ 1 ),..., w (N ' ) with the same probability 
distribution 

P{w (,) = (xi,..., x n )} = p Xl ■ • • p Xn , (4.39) 

where the input distribution P = {p x \ is chosen such that it maximizes I(X\Y). 
Note that we have 2 nH(X ' > (2 nH(Y ^) typical codewords for the input (output) and 
on average 2 nH(Y \ x i typical outputs for every input codeword w. In order for the 
error in the discrimination between different words at the output to tend to zero, it is 
necessary that the sets of typical output words corresponding to different input words 
do not intersect asymptotically. Hence, the size of the code is 


2 nH( Y \X) 


2 n(H(Y)-H(Y\X)) _ 2 nI(X;Y) = 2 nC shan 


(4.40) 


To make this argument precise, let us call an output word y n conditionally typical 
for the input word w = x n , if 

2 ~nmY\X)+S) < p{y n | w) < 2 ~n(H(X\X)-S)_ (4 . 41) 

Denote by T^’ S C V n the subset of all conditionally typical words for the given input 
word w. 


Exercise 4.15. Assuming that w is random, with distribution (4.39), show, by 

using the Law of Large Numbers, that P {Y n € Tf,’ S } < s for arbitrary s > 0 and 
sufficiently large n. 
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For a given encoding W = {u/ 1 ),..., }, we construct a special suboptimal 

decoding. The subsets may be intersecting for different j. Hence, to obtain 
nonintersecting decision domains we put 

vU) = Ku^(UkX*jT*& y ). (4-42) 

Now, if e.g. the word u;^ 1 ) was transmitted, the error occurs if and only if 

y n eVW = ^T )U (u % =2 T n J k) ). (4.43) 

We now assume that the codewords ..., are selected randomly, as de¬ 
scribed above, i.e. independently, each with distribution (4.39), and let us evaluate the 
expectation of the mean error probability P e (W,V). This will give us the required 
estimate, since obviously 


Pe(n,N ) < E P e (W,V). 

Denote by Piu{i?} the probability P{T” e B\X n = vj}, where B is some, possibly 
random, subset of V n . Due to the complete symmetry between the codewords, 

EP e (W, V) = EP^njKCi)} (4.44) 

= EP^ulFO) n T n ’ S (Y)} + EP^dIKCI) n T n ^{Y)}, 

where T n,s (Y) is the set of the output 5-typical sequences y n , defined similarly 
to Definition 4.2 with the replacement of H(X ) by H(Y). Then, taking into ac¬ 
count (4.43), 


_ N _ 

E P e (W, V) < P{Y n e 7^;?,} + Y. EP «»o){C*) n Tn ’ S ( Y )} + P{T n ’ S (Y)}. 

k=2 

(4.45) 

The first and third terms are < s for sufficiently large n by the corresponding proper¬ 
ties of the (conditionally) non-typical words. Each term in the second sum is evaluated 
as 


EE E p(y n \w {l) )p w (r )Pw «) 

UlO) yneT*-* n’l'n.SfJ) 

l2 ^(y| W ) EE E piy n 
w (» w (k) y n eT n - s (Y) 

= 2 n(H(Y\X)+8 ) E ( pyn j2 < 2 n(H(Y\X)-H(Y)+28) 

y n eT n - s (Y) 
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Here, the first inequality is obtained by introducing the factor 

2 nmr\X)+t) > J 

on the set T"f k) , while the last one follows from the second inequality in the definition 
of a typical word y n and the fact that P{T n ' S (Y)} < 1. Finally, 

EJ e (W, V) < 2s + (N - \)2 n(H{ y\ x) - H V r)+2&) < 2e + N2~ n(Csh ^~ 2S) , 

which can be made less than 3e if N = 2 nR , with R < Cshan and S sufficiently 
small. □ 

It should be noted that the method of random encoding that allows us to prove the 
existence of asymptotically optimal codes and which discloses their nature is unfor¬ 
tunately not very suitable for practical applications. Its realization even for moderate 
values of n requires a huge (exponentially growing) amount of computation for the 
decoding. The development of practically acceptable methods of encoding and de¬ 
coding is the subject of a special topic in information theory - coding theory - which 
makes ample use of the methods of modem algebra and combinatorics. 

4.5 Wiretap channel 

Another important issue in modem information theory is the study of multiple user 
systems - networks - such as Internet. A specific example of a system with a mali¬ 
cious party is the wiretap channel, which we will briefly discuss here, since it will be 
used in Ch. 10, which is devoted to quantum information transmission. 

Let X, V, Z be finite alphabets, where alphabet X is associated with the transmit¬ 
ter, V with the receiver, and Z with an eavesdropper. A wiretap channel is determined 
by a pair of channels p(y\x);q(z\x), one from transmitter to receiver and another 
from transmitter to eavesdropper. The transmitter sends the words w = (xi ,..., x n ) 
of length n through the channels p(y n \x n ),q(z n \x n ), and the goal is to achieve 
asymptotically exact transmission of a maximal amount of information to the receiver, 
subject to the condition that the eavesdropper receives an asymptotically negligible 
amount of information. 

Definition 4.16. For a composite wiretap channel, the code (M, r, V) of size N con¬ 
sists of a collection of messages M = {1,..., N}, the transition probability r(x n | m) 
from M to X", defining the randomized encoding of the messages m e M into words 
x n , and a decoding V for the receiver, given by a decomposition of the set 3/" into 
iV + 1 pairwise nonintersecting subsets V^°\ ..., c 3/”. 

The maximal error probability of such a code is equal to 

P e (M,r,V)= max (\ — p{V^ m ^\m)\, (4.46) 

\<m<N \ V 
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where p(V^\m) = P (Y n e V^\x n )r{x n \ m ) is the probability of a correct 
decision for the receiver. We call R achievable rate for the wiretap channel if there 
exists a sequence of codes (M.( n \r( n \ j of sizes N = 2 nR such that 

lim P e (M {n) ,r {n) 7 V (n) ) =0 

n->-oo v ' 


and 


lim l(M n ;Z n ) = 0, 
n-voo y 

where M n is a random variable, uniformly distributed over the set of messages M (n \ 
The last condition expresses the requirement that the eavesdropper’s information about 
the transmitted messages should tend to zero. The supremum of the achievable rates 
is called the private capacity C p of the wiretap channel. 

The Coding Theorem for the wiretap channel provides the following expression 

C p = max[/(M;T)-/(M;Z)], (4.47) 

where the maximum is taken over all possible triples of random variables M, Y, Z, 
such that the sequence M, X, (Y, Z) is a Markov chain, while the couples X, Y and 
X, Z are related correspondingly via channels p(y\x) and q(z\x). 

The proof is based on the following idea. Let us fix 8, e > 0 and a distribution for X 
and let T n,s be the set of 5-typical words of length n. Consider the “pruned” random 
encoding W ^ which uses N independent words, each uniformly distributed over the 
set T n,s . Let be the suboptimal decoding for Y constructed as in (4.42). A mod¬ 
ification of the proof of Theorem 4.13 allows us to prove that for N = 2 n ^ l(X;V ^~ s ^ 
the inequality 

P e {W (n) ,V (n) ) <e (4.48) 

holds with a high probability. 

To make the code secret, the transmitter has to sacrifice n[l (X\Z) + 8/2] bits 
of information by additional randomization of the messages. Assuming I (X,Y) — 
I (X;Z) > 0, let 

N = 2”[ / (*; z )+ 5 / 2 ] > Ny = 2 n[HX ' Y) - HX ' Z) -' iSl2] , 

so that N%Ny = N. Let us arrange the array of all codewords W (n ^ as a matrix with 
Ny rows and N% columns. In this case, 

= { w mj ;rn = \,...,Ny\ j = 1 ,...,N Z }. 

Next, let for any message m the transmitter choose the values j randomly, with equal 
probabilities. It can be shown that such a randomization makes almost all transmitted 
information hidden for the eavesdropper. For every m the collection of codewords 

{w mJ \j = 1,...,N Z } 


(4.49) 
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with high probability contains almost maximal possible information I(X ; Z), subject 
to the condition that the eavesdropper uses the optimal decoding. In other words, for 
every value of m, the sets of conditionally typical words corresponding to (4.49) cover 
almost the whole set of typical words for Z. The mutual information between the col¬ 
lections of codewords with different values of m is close to zero, so that randomization 
inside each collection erases almost all information going to the eavesdropper. This 
argument, together with relation (4.48), shows that the rate I(X; Y ) — I(X; Z) — 8 
should be achievable, while randomization is equivalent to using an additional chan¬ 
nel M -> X . A modification of a similar argument for any triple M,X,YZ satisfying 
the conditions of the Coding Theorem allows us to show that 7(M; Y ) — 7(M; Z) — S 
is also an achievable rate. 


4.6 Gaussian channel 


Consider the continuous analog of the memoryless channel (4.27), with the real line E 
as the input and output alphabets. The channel is defined by the transition probability 
density 

1 

p(yi\*i) = r- -f exp 

V27TO- 2 

Equivalently, one obtains the output random variables Yi by addition of the noise 
given by independent identically distributed Gaussian random variables Z* to the in¬ 
put signal xi, with EZ; =0 and EZ 2 = cr 2 

Yi = Xi + Zi\ 7 = 


(yt -xif 

2a 2 


i = 1,... ,n. 


Clearly, the naturally defined information capacity of such a channel is infinite if 
one does not introduce constraints on the signal. Usually, the quadratic constraint is 
considered 

n 

y xf < ns 2 , (4.50) 


i = 1 

motivated by finiteness of the signal power. Under this constraint the channel capacity 
is computed as 


C 



(4.51) 


This formula also belongs to Shannon and it also has a simple heuristic explanation. 
The inequalities 


- y Zf < a 2 + e; - V Y 2 < s 2 + a 2 + e 
n ^ 1 

i =1 i =1 

hold with high probability for arbitrary e > 0 and sufficiently large n. 
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I Exercise 4.17. Prove this statement by using the Markov inequality P{X > e} < 
E X/e for a nonnegative random variable X, and the relation (4.50). 

The first of these inequalities means that, for any input word w = {x \,..., x n ), the 
output vector Y n = (Y\,...,Y n ) lies in an n -dimensional ball with radius 

ijn (cr 2 + e) and center w, while the second tells us that the output vectors for ar¬ 
bitrary inputs that satisfy constraint (4.50) lie in an ^-dimensional ball of radius 
^n (.v 2 + a 2 + e), with its center at the origin. The ratio of the volumes of these 
balls gives an approximate value for the quantity of balls of the first type within the 
big ball, 


N = 


^n(s 2 + a 2 + e) 


= 2 ‘ 


nR 


y / n (° 2 + e ) 

where R = \ log (l + which can be made arbitrarily close to (4.51). 


4.7 Notes and references 

1. The ideas of reliable data transmission have been promoted since the 1930s and 
even earlier (see Verdu [211] for a detailed historical survey). Remarkably, Kotel- 
nikov, the Russian pioneer in the field, throughout his life maintained a deep interest 
in the foundations of quantum mechanics. The mathematical tradition in information 
theory goes back to the 1950s, when Khinchin and Kolmogorov gave insights into 
Shannon’s discoveries from the viewpoint of advanced probability theory. Nowadays, 
there are many excellent texts on information theory. Here, we mainly follow the 
book of Cover and Thomas [42], which provides an enlightening, nontechnical intro¬ 
duction to the subject. The method of types was systematically developed by Cziszar 
and Comer [44], 

2. The idea of using the exponential function in the Markov inequality with subse¬ 
quent large deviation estimates goes back to Bernstein, followed by Cramer. Its im¬ 
portance to information theory and statistics was emphasized in the work of Chemoff, 
see e.g. [42], lemma 12.9.1, and also [66], 

3. For a complete proof of Exercise 4.8 and its application to the additivity of the 
Shannon capacity, see Gallager [66], theorems 4.4.1, 4.5.1. 

4. The heuristic argument for the direct coding theorem was already present in Shan¬ 
non’s pioneering paper [182], Our proof differs from the one given in the book [42] 
in that it uses conditional rather than joint typicality. It is this approach that allows for 
the noncommutative extension, see Section 5.6. 
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5. Theory of multiuser systems is presented in the books of Cziszar and Comer [44], 
and Cover and Thomas [42], The proof of the coding theorem for the wiretap channel 
is given in [44], 

6. For a rigorous proof of the formula (4.51), see e.g. [42], The quantum analog of 
the channel with additive Gaussian noise is considered in Section 12.1.4. 



Chapter 5 

The classical-quantum channel 


5.1 Codes and achievable rates 

As was explained in Section 3.3.1, a simple mathematical model of a quantum com¬ 
munication channel is given by a map x -* S x that transforms the letters x of a (finite) 
input alphabet X into the quantum states at the channel output. We will call such a 
model a classical-quantum (c-q) channel. If an observable M = {M y } is measured 
at the output of such a channel, the conditional probability to obtain the outcome y, 
under the condition that the signal x was sent, is given by the formula 

p(y\x) = TiS x M y . (5.1) 

Thus, for a fixed quantum measurement we have a usual classical channel. This leads 
to the question as to the maximum amount of classical information that can be trans¬ 
mitted through the quantum communication channel with asymptotically vanishing 
error, i.e. its “classical capacity”. This question will be considered in detail in the 
present chapter. 

Consider the composite c-q channel which maps a word w = (x i,..., x n ) into the 
product state Sw — •S'jci ® ■ ■ • <8> S Xn in the space JC®”. The process of transmission 
of classical information is described by the diagram 

encoding channel decoding 

i —> w —> S w —> j (5.2) 

The assumption that a word w is mapped into the tensor product of states S Xj cor¬ 
responds to the definition of a memoryless channel in the classical case. At the 
channel output, there is a receiver who performs a measurement of an observable 
M = | | in the space JC®” (the outcome j means that the receiver decides 

that the signal j was sent). Thus, the resolution of the identity in the space Jf®” 
describes the overall statistics of the decision procedure, including both the physical 
measurement and the subsequent classical information processing. The choice of the 
observable M is formally similar to the choice of the decision rule in the classical 
case, but here plays a much more significant role, as we shall see. 

Definition 5.1. A code (W , M) of size N consists of a classical codebook W — 
i = 1,..., N}, which is a collection of N codewords of length n, and a quan¬ 
tum decision rule given by an observable M = {Mj\ j = 0, 1,..., N) in M® n . An 
outcome j = 1,..., N corresponds to the decision that the word w J was sent, while 
j = 0 means no definite decision. 
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The probability of decoding the signal i as j is equal to 

PwmUV) = Tr S w a)Mj, j = 0,1,..., N. (5.3) 

In this case, the probability of a correct decision is pwM 0 10 = Tr S w a,Mi, while 
the maximal error probability of the code (W, M) is equal to 

P e (W, M)= max [1 - PwmU 17)] • (5.4) 

l<j<N 

The average error probability is 


Pe(W,M) = - P WM{j\j)]. 

^ j = 1 


(5.5) 


In what follows we denote by 

p e (n, N) — min P e {W, M), p e (n,N) = min P e (W, M), (5.6) 

W,M W,M 

the maximal and the average error, resp., minimized over all codes (W, M) of size N, 
using words of length n. 

Definition 5.2. The classical capacity C of the c-q channel x S x is equal 
to the supremum of the transmission rates R that are achievable, in the sense 
that lim„_>.oo Pe («> 2 nR ) = 0 or, equivalently (by inequalities (4.32)), that 
linWoo p e (n, 2 nR ) = 0. 


5.2 Formulation of the coding theorem 

The notion of the entropy of a quantum state is a natural extension of the entropy of 
a probability distribution. Let S be a density operator in d -dimensional Hilbert space 
Jf, and let 


d 

S = 

j =i 

be its spectral decomposition. Its eigenvalues Sj form a probability distribution. The 
von Neumann entropy of the density operator S is defined as 

d 

H(S) = lo S 5 7 = (5-7) 

7 = i 

where the function rj(-) is given by (4.2), and is equal to the Shannon entropy of the 
probability distribution {s/}. In what follows we use a more spectacular expression 
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H(S) = —Tr S log S, although the logarithm is not defined for degenerate density 
operators. As in the case of distributions, this will not lead to ambiguities. From (4.3) 
it follows that 


0 5 H(S ) < log d, 

with the minimum achieved on pure states, and the maximum on the chaotic state 
S = jl. As in the classical case, the entropy is a measure of the uncertainty and 
also of the information content of the state (this last statement will be substantiated in 
Section 5.5). 

The following properties of the quantum entropy are easy to check: 

Exercise 5.3. 

i. Unitary invariance: H(VSV*) — H(S), where V is a unitary operator. 
Moreover, this equality holds for any operator V, isometric on the support 
of S; 

ii. Additivity: H(S\ ® S 2 ) = //(Si) + //(S 2 ). 

In earlier works on quantum communications, the quantity 

X«*x};{S x }) = #(£>*5,) -][>*//(S*) (5.8) 

X X 

with 7 r = { 7ix } a probability distribution on X, was used to evaluate the capacity of 
the c-q channel x -> S x on heuristic grounds. The quantity (5.8) can be considered a 
formal quantum analog of the Shannon information, where the first term plays the role 
of the output entropy, while the second - that of the conditional entropy of the output 
with respect to the input. Remarkably, this quantity appears to be strictly related to 
the classical capacity of the c-q channel as defined in the previous section. When 
the states {S*} are fixed and there is no risk of confusion, we shall abbreviate the 
notation (5.8) as x( n ) and put 


C x = ma,xx(n). (5.9) 

n 

The main goal of the next sections will be the proof of the following Coding Theo¬ 
rem: 

Theorem 5.4 (Holevo [97]; Schumacher and Westmoreland [177]). The classical ca¬ 
pacity C of the c-q channel x —>• S x , defined as in Definition 5.2, is equal to C x . 

Note that the quantum entropy is continuous on the compact set of quantum states, 
hence x is continuous in n and the maximum in the above formula is indeed achieved. 
Let us introduce the notation 

Sit = ti x S x 


(5.10) 
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for the average output state. In general, 

with equality in the case of the pure-state channel S x = |V / x)(V / x|- Since the max¬ 
imum possible value of the entropy is log dim JC, this theorem implies an absolute 
upper bound on the classical capacity: 

C <logdimJf. (5.11) 

Thus, in spite of the fact that in a Hilbert space there are infinitely many different 
pure states, the finite-dimensional quantum carrier cannot be used to transmit an un¬ 
limited amount of classical information. The upper bound (5.11) is achieved for the 
channel with orthogonal pure states S x = \e x )(e x \\ x — 1 ,d, where {e x } is an 
orthonormal basis in Jf, d = dim 3i, and for the uniform distribution n x = 1 jd. 
Note that such states, as a rule, cannot be obtained at the output of a realistic commu¬ 
nication channel. It is remarkable, however, that orthogonality of the output states is 
not necessary in order to attain the capacity of the ideal quantum channel. 

Example 5.5. Consider the trine channel with the three equiangular pure states i//o, 
^i, V 7 2 in a two-dimensional Hilbert space as in Example 2.27. Taking the equal 
probabilities n x = 5 , we obtain the chaotic average output state 

- 2 1 

S„ = J2 n * \fx){tx\ = -L (5.12) 

x=0 

implying C x = log 2 = 1 bit, i.e. the capacity of the ideal channel. 

More generally, if {ij/ x } is an arbitrary overcomplete system in a Hilbert space 3i, 
the pure state channel x —* ^1$*) has the maximum possible capacity log d , and 
the maximum is attained on the distribution n x = (ty x \\jr x )/d. 

Example 5.6. Consider a binary channel with two pure states with unit vectors xfro, if 1 
as in Example 2.25. In this case, 


C 


x 



(5.13) 


where e = |(V / olV / i)l- Indeed, the entropy H(S n ) is maximized by the uniform distri¬ 
bution 7 ro = tti = 5 since, by concavity of the quantum entropy (see Corollary 7.17 
in Chapter 7), 


( 1 1 \ / c _L c'\ 1 

2 ^o)(^o| + ^iXV'tlJ = H J > - (H(S) + H(S ')), 
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where 


S = no\fo)(fo\ + ni\xfri){fi\, S' = 7Ti | V^o) {I + ttoI^iX^iI- 

and H(S') = H(S ) due to the reflection symmetry with respect to the bisector of the 
angle between the two vectors. Similar to the solution of Exercise 2.26, the eigenval¬ 
ues of the density operator ^iV'bXV'bl + ^iV'dXV'dl are equal to whence (5.13) 
follows. 


5.3 The upper bound 

Consider an arbitrary c-q channel x — > S x and an observable M = { M y } at its output. 
Note that the alphabets X and V need not coincide. The variables x and y are related 
by the classical channel pm (y |x) = Tr S x M y . Denote by 

Ai(n, M) = J2 n xPM(y\x) log ( ) (5.14) 

^ \T,x' PM(y\x>x'J 

the Shannon information between the variables x and y corresponding to some input 
distribution n = {n x }. 

Exercise 5.7. For given iz and V the information quantity S\(tz, M) is a continu¬ 
ous and convex function of the transition probability pM(y\x), and hence of M. 
Hint: use the corresponding property of the Shannon information I(X\Y). 

For given distribution n on the input alphabet, consider the accessible information 

A(n) = sup M), 

M 

where the supremum is taken over all possible observables at the output of the channel 
x S x . 

Proposition 5.8 (Davies [47]). The supremum of the information quantity S\(n, M) 
is attained by the observable M° of the form 

My = \<t>y){<t> y [, y = \,...,m, (5.15) 

where m < d 2 in the case of complex M (m < j n the case of real M). 

Proof For any k = 2,3,... the set 931^ of observables with k outcomes is com¬ 
pact, and the continuous convex functional attains its maximum on the 

set aflfc. Assume that M° maximizes on SR*. By throwing off, if nec¬ 

essary, zero components, we obtain the observable M° e 931/,/ < k, for which 
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J i (tt, M°) — ,) i (tt, M°). By making the spectral decomposition of the components 
of the observable M° into rank one operators, we can construct the new observable 
M° of the form (5.15), with m > l, for which M° is a coarse-graining, 

Mi, M 2 ,... , , .. ■ , M m . 

=Mi =M 2 =Mi 

Then S.\(n,M°) > $\ (tt, M°). In fact, this is a statement about the classical Shan¬ 
non information, /( X; /(F)) 5 I(X;Y) or, equivalently, H(X\f(Y )) > H(X\Y), 
which follows from the monotonicity of the classical conditional entropy, since 
H(X\Y) = H(X\f(Y),Y). 

The function M -»■ S(n, M) is convex. Therefore, we can assume that the maxi¬ 
mizing observable M° = {My} is an extreme point of the set 9Jl m . Then, according 
to Theorem 2.21, the operators My are linearly independent and Exercise 1.5 implies 
the estimate m < d 2 . Since k was arbitrary, it follows that sup M ,i \(n, M ) is attained 
on an observable from the compact convex set 9)1^2. □ 

Theorem 5,9 (Holevo [95]). For an arbitrary distribution tc 

A(n) = maxtli(7r, M) < x (tt) , (5.16) 

M 

with, equality attained if and only if the operators tc x S x ; x e X all commute. 

In Chapter 7, we will obtain the information bound (5.16) as a corollary of the rather 
general monotonicity property of the quantum relative entropy. Here we give instead 
an outline of the original proof, based on the comparison of the convexity of the clas¬ 
sical and quantum entropies. This proof reveals a connection between the quantum 
entropy and the noncommutative analog of the Fisher information, and moreover al¬ 
lows us to obtain a necessary and sufficient condition for equality. The latter condition 
will be used in the next section, in order to establish the nonclassical property of su¬ 
peradditivity of the Shannon information for the c-q channel. 

Proof Without loss of generality, we assume all n x > 0. In this case, the condition 
for equality amounts to the commutativity of all operators S x . Now, the equality 
in (5.16) is apparently achieved by the common spectral measure M of the operators 

S x - 

To prove the inequality let us first consider the case of two states So, S\. Denote 

X (t) = H((l-t)So + tSi)-(l-t)H(S 0 )-tH(Si), t e [0,1], (5.17) 

Also set St = (1 — t)So + tS\ , D = S\ — So and let S t = s^E^ be the spectral 
decomposition of the operator S t . Here and in what follows, will denote strictly 
positive eigenvalues. 
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By using Cauchy’s integral formula, we have 

S t log S t = <^> (Iz - S t )~ 1 z log zdz, 

where z log z is the branch of the function, analytic in the right halfplane, and the 
integral is taken over a closed contour embracing the segment [e, l],e > 0, which 
contains all positive eigenvalues of S t . Differentiating twice with respect to t. 


[Sr log S f ]" = [(/* - Sr) -1 ]" z logzJz, 

where, using the resolvent expansion and taking into account that S' t = D, 
[ (Iz - S t r 1 ]" = 2 (Iz- St)- 1 D (Iz - Sr) -1 D (Iz - Sr) -1 
By using the fact that Tr E k DEjD = Tr EjDE k D due to (1.11), we have 

x"(t) = —\ <£ Tr (Iz - S t )~ 2 D (Iz - St)- 1 Dz log zdz 
Til J 


1 

ni 



Tr E k DEjD 

(z - Sk) 2 (z - Sj) 


z log zdz 


= -I> 'rE k DE J D)f(s k ,s j ), 
k,j 


(5.18) 


where 


f(a,b) 


2ni £ L_., 


+ 


1 


(z — ay (z — b) (z — b)^ (z — a) J 


z logzr/z 


log a — log b 
a — b 


, a y b; f(a,a)=a 


-l 


Note that Tr E k DEjD = Tr E k DEjDE k > 0. By using the inequality 

2 


f(a,b) > 


0 < a < 1, 0 < < 1, 


a T b 

in which equality is obtained if and only if a = b, we have 

2 


E k DEjD 


k,j 


s k + Sj 


(5.19) 


(5.20) 


(5.21) 


with equality attained if and only if Tr E k DEj D = 0 for all k y j. The latter 
condition is equivalent to [ D , S f ] = 0, i.e. [Sq, Si] =0, because of the identity 


TV (D, S f ]* (D, S t ] = - Sj) 2 Ti E k DEjD. 

kj 
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| Exercise 5.10. Prove relation (5.19) and inequality (5.20). 
Exercise 5.11. Show that the operator 

2 


L t = ’Y^E k DE j 


k,j 


*k + Sj 


is a solution of the equation 


St ° E t = -[S t L t + E t S t ] — D. 


Moreover, 

J^Tr E k DEjD —-—= Tr DL t = Tr S t L 2 t . (5.22) 

kj Sk + Sj 

The operator L t is a noncommutative (symmetric) analog of the logarithmic deriva¬ 
tive of the operator-valued function S t , while (5.22) is an analog of the Fisher infor¬ 
mation quantity in mathematical statistics. From (5.21), (5.22) it follows that 

f{t) < -Tr DL t = -Tr S t L 2 t , 0 < t < 1, (5.23) 

in particular x"(t) < 0, so that /(t) is concave on the segment [0,1] with y(0) = 
/(l) = 0. Moreover, equality in (5.23) is achieved if and only if [So, Si] = 0. 

Now let M — {M y ) be an arbitrary observable, let Pt(y) = Tr S t M y = 
(1 — t)Po{y) + tP\{y) be its distribution in the state S t , and Jm) 1 ) = {it, M), 
where n = {1 — t, t). In other words, 

J M {t) = H{{ 1 - t)P 0 + tP{) - (1 - t)H(Po) - tH{Pt), 


where H(P) is the Shannon entropy of the probability distribution P. Also denote 
D(y) — P\{y) — Poiy). Applying the previous argument to the diagonal matrix 
diag[P f (y)] instead of the state S t , we obtain 

Jm( 0 = - = -T±DA t , (5.24) 


where 


Af = 


Y. M > 

y 


Djy) 
Ptiy) ' 


The first equality in (5.24) holds due to commutativity of the diagonal matrices. The 
right-hand side of (5.24) is equal to minus the classical Fisher information for the 
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family of measurement probability distributions. We have < 0, so that the 

function /m(?) is concave on [0,1], and Jm( 0) = Jm( 1) = 0. 

Finally, let us prove the inequality between the classical and the quantum Fisher 
informations 

Tr DA t < Tr DL t , (5.25) 

which implies, via (5.23), (5.24), that 7^(?) > x"( { )’ an( i hence 

Jm (t) < x(t), 0 < t < 1, (5.26) 

with the sharp inequality if [So, Si] ^ 0. 

Proof of the inequality (5.25). First, note that 

E D(y ) 2 , 

-~^->TrS t A 2 , (5.27) 

y p t(y ) 

since 

a t<T, M y 

y 

by the following operator generalization of the Cauchy-Schwarz inequality: 

I Exercise 5.12. Prove the following statement: given a resolution of the identity 
{M y }, one has for any real c y 



Now by using (5.27), we obtain 


D{y) 

Pt(y). 


TiDL, = TrS t L 2 = TVS, [A, + (L t - A,)] 2 > TrS,A 2 + 2TiS t (L, - A,) o A, 
= -TrS t A 2 + 2Tr (5, o L t ) A t = TTiDA, - TAS t A 2 
= TxDA t + [Tr£)A f -TrS r A?] > TiDA t . 

This completes the proof of Theorem 5.9 in the case of two states. The case of 
several states S x ; x = 0,1,..., k, with distribution n = {tt x ; x = 0, 1..... /c} is 
reduced to the case of two states with the help of the following recurrent formula: 

Exercise 5.13. 

k 

x(x)= '%2(no + --- + n m )Xm(tm), 

m =1 
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where 


Xm(t ) — ^((1 — "h tSm) (1 t)H(S™) — tH(S m ), 

^ _ Hm 

tm — ; ; > 

TTO + ' ‘' + 7T m 


Tto + • • • + rr m _i 


^ m = E 

7=0 


Similarly 


k 

Si(n,M)= - y-n m )J^{t m ), 

m =1 


where J^(t m ) are defined in the same way as Xm(tm), via the Shannon entropies 
of the measurement probability distributions. From the proven result for two states, 
J^Um) 5 XmUm)- Hence, the inequality (5.16) follows. If there is at least a 
pair of noncommuting states, we can take them as So, Si, and the sharp inequality 
follows. □ 


5.4 Proof of the weak converse 


The bound (5.16) is the main tool for proving the weak converse of the Quantum 
Coding Theorem, which implies C < C x . 


Theorem 5.14 (Weak Converse). If R > C x , then 

liminf p e (n, 2 nR ) > 0. (5.28) 

n—>oo 

Proof. Consider the composite, memoryless c-q channel u; -* S w , where w — 
denote words of length n . Let S n (n M be the Shannon information defined for 

this composite channel similarly to where = {n w } is a probability 

distribution of codewords of length n and is an observable in M ®”. Then (5.16) 

implies 

4(7r (n) ,M (n) ) < Cjf>, (5.29) 

where 

ci n) = max/({7r u; };{5 U ;}) = max 

jr<") jr<"> 

(5.30) 


H 


E 


}/; S }/; I 


j ~ y^TTu;// (S u 

/ W 


Lemma 5.15. The sequence C 


(n) 

x 


is additive, 


i.e. 


f' (W) 


nC x . 
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Proof. The inequality C x n ^ >nC x follows from the additivity of the entropy by using 
the product input probability distributions n^ n K The proof of the converse inequality 
C ( x n) < nC x follows from subadditivity of the von Neumann entropy with respect to 
the tensor products (see Corollary 7.4), which implies 


Xn{n {n) )<Y,ri r ( ”’ fe) )’ 

k = 1 

where n^ n ’ k ^ is the k-th marginal distribution of it on X. 


(5.31) 


□ 


Defining 

C„ = max d n (x {n \M M ), (5.32) 

we thus have C„ < nC x and, applying the classical Fano’s inequality (4.34), we 
obtain, similar to (4.38), 


P e (W,M ) > 1 


Cn 

nR 


-- > 1 - 


nR 


R 


1 

nR ’ 


(5.33) 


and hence, (5.28) follows. 


□ 


The quantity C„ is the maximal Shannon information accessible through n uses of 
the c-q channel when arbitrary decodings are allowed on the quantum output M® n . 
The following result clarifies its relation to the classical capacity. 


Proposition 5.16. The classical capacity C of the channel x -* S x is equal to 


sup —C n = lim — C n . 
n n n n 


(5.34) 


Proof The equality in (5.34) follows from superadditivity of the sequence {C„}, 


I Exercise 5.17. Show that the sequence C n is superadditive, i.e. C n+m > 
C n + C m , by taking product probability distributions n^ n+m l = n^ x 

The inequality C < sup„ ^C„ follows from the first relation in (5.33). The op¬ 
posite inequality C > sup„ ^C n follows from the direct statement of Shannon’s 
Coding Theorem 4.13 applied to composite channels. Indeed, let us show that any 
R < sup„ ~Cn is an achievable rate. We have n®R < Cno for some no- Hence, we 
can choose an observable M^ n °^ in X® n ° such that 


n 0 R < max4(7r ( ” o) ,M (Mo) ) = C(M ( " o) ). 

jr*”o) 


The quantity C is the Shannon capacity of the classical channel 


PM O) (7>” 0 ) = ^5 x n 0 M]” o) . 


(5.35) 
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Therefore, the minimal error probability for this channel satisfies p e (n,2 n ^ n ° R ^) —»■ 
0 as n -* oo. Apparently, p e (nnQ,2 n ^ n ° R ^) < p e (n, since the chan¬ 

nel (5.35) corresponds to a certain block decoding for the initial c-q channel. Thus, 
p e (nn 0 , —»■ 0 as n —»■ oo. For arbitrary n', one can find n such that 
nno < n' < (n + l)«o- Then 

p e (n', 2 n ' R ) < p e (nn 0 , 2 ( ” +1 >” oi? ) < p e (nn 0 , 2” ( ” oi?,) ) -» 0, (5.36) 

if R' is chosen such that /?(1 + 1/n) < R' < sup„ ^C n for sufficiently large n. □ 
The quantity 


C i = max $ i ( ji , M) 

71 , M 

which we call the Shannon capacity of the c-q channel x -»■ S x is of special interest. 
Similar to the proof of Proposition 5.16, it can be shown to be equal to the capacity of 
the c-q channel x -»• S x , with the additional restriction that only product observables 
{My x <8) • • • <8> My n } are allowed at the output of the composite channel. More generally, 

we call the decoding — {Mj n ^ } in 3t® n unentangled, if 

M j n) = E PU\y n X ® • ■■ ■■ ® M? n . ( 5 - 37 ) 

y n 

where {My } is a quantum observable for the /c-th copy of the space X, and p(j \y n ) 
is a conditional probability describing the classical (stochastic) transformation of the 
measurement outcomes of a product observable 

(5.38) 

at the output. 

Exercise 5.18. Denoting by C uc i the supremum of achievable rates for the codes 
(W,M), with the additional constraint (5.37) on the decoding M, show that 
C u d — C\. Hint: the effect of the decoding (5.37) can be described by concate¬ 
nation of the classical composite channel corresponding to the product observ¬ 
able (5.38) and the channel p(j\y n ). Use the data-processing inequality (4.20) 
and the analog of the first inequality in (5.33) to prove that C uc i < C\. For 
the proof of the converse inequality, use the direct statement of the classical 
coding theorem for the channel PM(y\ x ) — Tr S x M y to prove that any rate 
R < maxjr $, (it, M ) is achievable. Since the decoding M in X is arbitrary, it 
follows that any rate R < C i is achievable. Hence, Ci < C uc [. 

So far, we have shown that 


Cl < c < c x . 
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The direct statement of the coding theorem, which will be proven below, implies 
C = C x . The following result means that Ci < C for channels with essentially 
quantum output. Thus, for the memoryless c-q channels the capacity can be increased 
by using entangled decodings. It follows that, while C n = nC\ for a classical memo¬ 
ryless channel (see Proposition 4.9), in the quantum case strict superadditivity of the 
classical information C n > nC\ is possible, the reason being the existence of entan¬ 
gled observables at the output of the composite quantum channel. One can say that 
this is a dual manifestation of the EPR correlations. The latter arise when the state of 
a composite quantum system is entangled, while the measurements are local, i.e. un¬ 
entangled. Here, the strict superadditivity of the information appears for the product 
states and entangled measurements. 

Proposition 5.19. The equality C\ = C holds if and only if the operators tt x S x 
commute for a probability distribution tt° that maximizes y (tt). 

Proof The sufficiency of the condition is obvious. Conversely, let C i = C. In this 
case, by using Proposition 5.8, compactness of the set of probability distributions {7r} 
and continuity of the Shannon information $ j (7r 1 , M 1 ), we have Ci = $ i (it 1 , M 1 ) 
for some distribution n 1 and observable M 1 . Hence, 7 1 (tt 1 , M 1 ) = /(tt 0 ). It 
follows that y (tt 1 ) = y (tt°), otherwise, by using inequality (5.16), we would have 

t f 1 (7r 1 ,M 1 )<y(7r 1 )<y(7r°). 

Thus, we can replace n° with jr 1 , tl i (jr 1 , M 1 ) = y(rr 1 ), and necessity follows from 
the second statement of Theorem 5.9. □ 

The following examples (more detailed references and discussion are given in Sec¬ 
tion 5.8) illustrate the inequality Ci < C. 

Example 5.20. For the trine channel, 

Ci = 1 -h 2 \ ~ 0-645 bit (5.39) 


which is obtained for the non-uniform distribution it = [1/2,1/2,0] and for the 
optimal measurement for the two equiprobable states rj/o,rj/i, see Sasaki et al. [176], 
Thus, Ci < C and the sequence {C„} is strictly superadditive. Note that taking 
the uniform distribution it = [1/3,1/3,1/3] and the information optimal observable 
results in the smaller quantity log(3/2) 0.585 bit. 


Example 5.21. The Shannon capacity of the c-q channel with two pure states is given 
by 



Cj = max $ ( 71 , M ) = 1 — /i 2 

7Z,M 


(5.40) 
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where the maximum is achieved for the uniform distribution (no = n\ — and for 
the sharp observable M given by the orthonormal basis situated symmetrically with 
respect to the vectors \ij/o), iV'h), see Levitin [144], Here, e = l(V / 'olV / 't) — cosa, 
where a is the angle between the vectors. Note that in this case the optimal observable 
is the same as the one obtained by the maximum likelihood criterion (Example 2.26), 
while Ci is just the capacity (4.24) of the classical binary symmetric channel, with 
error probability 

■ ~ 1 — sin a 

p = sin (jz /4 — a/2) = ---. 

The plot of the quantities Ci, C as functions of e is given in Figure 5.1. 


bits 



Figure 5.1. The capacity of the binary pure-state channel. 


5.5 Typical projectors 


Before proving the direct statement of the coding theorem for the quantity C x , we 
shall consider the important notion of the typical subspace due to Schumacher and 
Jozsa [125], 

Consider the tensor degree of a density operator 

S® n = S <g> • • • <g> S 


in the space M® n . Let 

j 

be the spectral decomposition of the operator S, where {e/} is an orthonormal ba¬ 
sis of eigenvectors and A j are the corresponding eigenvalues. Note that A j form a 
probability distribution with entropy 

£>(Ay) = H(S). 
j 
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Then the spectral decomposition of the product state 

S® n = S <g> •••(8) S 


can be written as 


S®» = J2lj\ej){ej\, 

J 

where/ = 0'i,... ,j n ), with eigenvectors \ej) = |e 7l ) <8> ••• <8> \ej n ) and eigenvalues 
Xj = Xj x ■ ■ ■ Xj n , describing a probability distribution of i.i.d. classical random vari¬ 
ables. In accordance with Definition 4.2, the eigenvector \ej) will be called 8 -typical 
if 


2 -n(H(5)+«) <Xj < 


From the spectral decomposition of S® n we can now construct the projector P n,s 
onto the typical subspace X n,s = p n ’ S j(® n by the relation 

P n ’*= E k/>WI- 

JeT nS 


We also call P n, ° the typical projector. 

The properties of the typical subspace X n,& are analogous to the properties of the 
set T n,s of typical words (see Section 4.1) and easily follow from the latter: 

i. The dimensionality dim X n,s = Tr P n,s = \T n,s \ satisfies the inequality 

dim X n,S < 2"( h(5)+5 ). 


ii. The contribution of the non 5-typical eigenvectors to the operator S® n can be 
made arbitrarily small, i.e. for given 8 , e > 0 and large enough n 

Tr S m (l - P n ’ S ) < e. (5.41) 


To see this, we evaluate the trace in the basis of the eigenvectors of the operator 
S® n , and find that it is equal to 


E> = p 

JeT nS 

= P 


-logA/-ff(S) 

n 



^ E 1 o s a a 

k =1 


H(S) 


> 8 


(5.42) 


where P corresponds to the probability distribution {Xj} and, by the Law of Large 
Numbers, the last probability can be made arbitrarily small for n large. 
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iii. For sufficiently large n 


(1 - S ) 2 n ( ms) - s ) < H im di n ’ S . 

Now, we can formulate the noncommutative version of the asymptotic equipartition 
property, for large n the state S® n approaches the chaotic state in the subspace di n,s 
of dimensionality ss 2 nH( - s \ 

An immediate application of this property is the quantum analog of data compres¬ 
sion. As we have seen, the dimensionality of the underlying Hilbert space reflects the 
potential classical information resource of the quantum system. On the other hand, 
e.g. in quantum computation, the logarithm of the dimensionality measures the size 
of the memory, i.e. the number of qubits in the register, which is the most important 
characteristic that one would like to minimize. Consider the problem of encoding 
quantum states into other states in a Hilbert space of the smallest possible dimension¬ 
ality without essential loss of the “quantum information” carried by the states. To this 
end, we consider a quantum source that produces the pure states S x — \fx){fx\ 
with probabilities p x . The words w = {x\,... ,x n ) generate the product states 
s w = \tyw)(tyw\, where \f w ) = \f Xl ) <g> ••• <g> \f Xn ) e di® n , which are then 
encoded in some other states S' w in a subspace did c di® n , the dimensionality of 
which d = dim did should be as small as possible. The fidelity of the encoding 
is measured by the quantities (x/f w \S^\xlr w }. The closer they are to 1, the better the 
encoding. We require that the average fidelity 

Fn — ^ ^ ttw (^w I'S'ui \tyw) 

W 


converges to 1 as n —>■ oo. 
Let 


d 

Sit = ^2 n x\tx){fx\ 

X—1 

be the average state of the source. 

The following is called the Quantum Data Compression Theorem. 

Theorem 5.22 (Schumacher and Jozsa [125]). 

i. For all small enough e, 8 > 0 there exist a Hilbert space did C di ® n of dimen¬ 
sionality d = and density operators S' w in did, s uch that the fidelity 

F„ > 1 — sforn large enough. 

ii. For any choice of the subspace did C di® n with d = and of the 

density operators S' w in did it holds that F n < sforn large enough. 
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Proof. 

i. Consider the source state \ijr w ) — \fx\) <8> ••• <8> \fx„)- We now define the 
“compressed” state as 


P n ’ S \fw)Ww\P n ’ S 

(tw\P n ’ S tw) 


(5.43) 


where P n,s is the typical projector of the state S n n . The operator S' w acts in the 
typical subspace J( n ' S , which has a dimensionality of at most d — 2 n( - H( - s ^ +s \ 
For the average fidelity we obtain 


Fn — 'y (j^w I'S'ui tyw) 

w 

= J^n w {f w \P n ’ S ir w ) = Tr S® n P n ’ S , 


because S n n = Y1 n w\fw){fw\- Due to the second property of P n ’ S , we can 
bound this quantity from below by 1 — e. 

ii. Taking into account the fact that the inequality S' vl < P d holds for all S' w in 
M d = P d M® n , we have 

F n = J^n w {f w \ s M < 1 \S® n Pd 

= TrS® n pi’ n P d + Tr S® n (l - pi' n ^)P d 
<\\ST PH-Fr P d + TrST(l - P^)- 

Here we also used the following general property of the trace: if T > 0, r fr TX < 
Tr 7” || A - 1|, which follows from (1.18). By the definition of typical eigenvectors, 
the corresponding eigenvalues are upper-bounded by 

\\S® n P n ’ 8 \\ < 2 ~ nWS) - S \ ( 5 . 44 ) 


By using this inequality and the property ii. of the typical projector, we obtain 
that for n large enough the quantity F n does not exceed 

2 -n(mS„)-Sl2) d + 


Thus, if we take d = 2 n( - H ^ s,T ^ F n <2 n& ! 2 + s < 2s for n large enough. □ 

For the proof of the direct statement of the Coding Theorem in the next section, we 
will also need the notion of a conditionally typical projector. Consider the product 
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states S w = S X] <g> ••• <g> S x „, where S x is now an arbitrary (not necessarily pure) 
state and w = (xi,..., x„) is a word in the input alphabet. Denote by 

HASv) = 2>* H (Sc) 

the quantum analog of the conditional entropy of the output with respect to the input. 
Let P w = Py/ S be the spectral projector of S w corresponding to the eigenvalues in 
the interval (2~ n ^’ r ^ s ^ +s \2~ n ^ 7l( - s ^~ s ^). In more detail, consider the spectral 
decomposition 

S * = T. X * WW- 

j 

In this case, the spectral decomposition of the operator S w has the form 

^ = I. 

j 

where X y J = A* 1 • • • Ay" are the eigenvalues and \e y j) — |e*') <g> • • • <g> |e * n ) are the 
corresponding eigenvectors. The conditionally typical projector is defined as 

fv= £ W>WI. 

JeT"' s 

where 

T n,s = |/ • < A j < 

The essential properties of the projector P w are: 

i. from the definition, 

P w < S w 2» [ ^ (S (-))+* ] ; (5.45) 

ii. for e > 0 and sufficiently large n 

ETr S w (I-P w )<s, (5.46) 

where E is the expectation corresponding to the probability distribution 

P{w = (xi,..., x n )} = 7t Xl • • • 7t Xn ; (5.47) 

To prove the second property, consider the sequence of independent trials with out¬ 
comes (x/, ji) ; / = 1,...,«, where the probability of an outcome (x, j) is, in each 
trial, equal to tt x Xj. In this case, 

ETr S W (I- 6 T"’ 3 }, (5.48) 

which tends to 0 as n —>■ oo, according to the Law of Large Numbers, see Exer¬ 
cise 4.15. 
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5.6 Proof of the Direct Coding Theorem 

According to the Definition 5.2, it is sufficient to show that the minimal average error 
probability p e (n, 2 nR ) tends to zero, as n —>• oo, provided R < C x . 

Consider the average state 

S71 = HxSx 

X 

and the typical projector P = P n,s of the state S®", defined in the previous section. 
For a given codebook W = {w^ l \ ..., we can also define the conditionally 

typical projectors P w uV,j = 1, •.., N. We now introduce the special suboptimal 
observable M. The classical Shannon’s Coding Theorem 4.13 corresponds to the 
case of diagonal density operators S x with conditional probabilities p(y |x) on the 
diagonal. The projector P is a quantum analog of the indicator of the subset of all 
S-typical output words, while P w u) is the same for the subset of all conditionally 
typical words, given the input word w^ J \ However, in the quantum case these projec¬ 
tors need not commute, which makes a straightforward generalization of the decision 
domains (4.43) and the corresponding measure-theoretic argument impossible. There¬ 
fore, we introduce the following observable M in by letting 



(5.49) 

The normalization with the square root is necessary, because the sum of the opera¬ 
tors PP w (j)P may not be equal to the unit operator, and the replacement of P w u) 
by PP w u)P plays the role of intersection with the set of all typical words in the 
formula (4.44). The operator ( Xml'Ll PPw 1 P) is to be understood as the gener¬ 
alized inverse of (Xl/'Li PP W ‘ P) e 9 ua l t0 0 on the null subspace of that operator, 
which contains the range of the projector I — P. Denoting by P the projector onto 
the support of XlfLi PP W ‘P^ we have 

PP w d)P<P<P, l = (5.50) 

To avoid cumbersome notations, we shall further enumerate words by the variable 
w, omitting the indices j,l. By denoting 



and using the Cauchy-Schwarz inequality (2.24), 

ITrS^Au,! 2 < Tr S W A W *A W , 
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we obtain 


1 N 2 N 
P e (W,M) < - ^[1-lTrS^I 2 ] < - ^[1-TrS^], 

w = 1 w = 1 

where Tr S W A W = Tr PS w P w P(Ylw'=i PPw'P) ^ is a real number between 0 
and 1 (note that S w P w = P W S W > 0). Applying the inequality 

-2x“ 1/2 < —3 + x, x > 0, (5.51) 

we obtain by (5.50) 

( N \ _1/2 N N 

Y pp ™ ,p ) < -3P + y pp W 'P < -3pp w p + y pp W 'P. 

w '=1 / w '=1 w '=1 

Therefore, by using the inequality (1.13), 


N N 

Pe(W,M) < —Y^ 2TrSw ~ 3Tl SwPwPPwP + Y ^ SwPwPPw'P] 

w = 1 w '=1 

1 N 

= J Yl™^ 1 - p ™ pp '» p )+ E Tr SwPwPPw'P]. 

w = l w':w'^w 


Taking into account that 

TrS w (I- PwPPwP) = TvS w {I - Pw)PP w P + Tr S W (I - P)P W 

-TrSwd ~ P)Pw(I ~ P) + Tr S W (I - P W )P 
+ Tr S W (I ~P) 

< 2[Tr S W (I ~ Pw) + TrSw(I- P)l 
where we used (1.13), we can write 
1 N 

P e (W, M) < — "Y, [4Tr S w d — p ) + 4Tr S W (I — p w) + Y TrPS w PP W '], 

w = 1 w':w'^w 

(5.52) 

which is our final basic estimate, similar to (4.45) in the classical case. 

We now again apply Shannon’s random coding scheme, assuming that the words 
w^\ ..., are chosen at random, independently, and with the probability distri¬ 
bution (5.47) for each word, where n is the optimal distribution for which 


H(S ]t )-H ]t (S i . ) ) = C x . 
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Then 


ES W — ^ ^ ?tx\ ■ ■ ■ Hxn Sxi <8> ... <8> S x „ — S ® . 

X\...X n 

Taking the expectations in (5.52) and using the independence of random operators 
S w , P w /, we obtain 

EPe(W, M) < 4TV5®”(7 - P) + 4ETv S W (I - P w ) 

+ (N -\)TrS® n PEP w ,. 

By inequalities (5.41), (5.46), expressing typicality of the projectors P, P w , and by 
inequality (1.18) we have 

EP e (W,M ) < 8e + (N - l)||5®",P||TYEfV, 

for n > n(n, e, <5). By the property (5.44) of the projector P, 

\\S® n P)\<2- n[m ^- S ], 

and by the property i. of the projector P w , 

Tr EP W ' = ETr P w > < ETr S w > • 2” [ ^ (5 o ) + 3] = 2 n ^ ( - s ^+ s \ 

Remembering that H(S „) — H K (S^) = C x , we obtain 

p e (n,N ) < EP e {W, M) < 8e + jV2“" [c *“ 2,s] . 

Thus, if R < C x - 38, 


p e (n, 2 nR ) < 8e + 2~ nS , 

hence, the Direct Coding Theorem follows. □ 

A slight modification of this argument shows exponential decay of the average error 
probability. Namely, there exists a /J > 0, such that 

EP e (W, M) < 2 - "^. (5.541 

Indeed, we already have such an estimate for the last term in (5.53), provided N < 
2«[C z -35], so we jj ave t0 establish it for the first two terms. But this follows by 
application of inequality (4.11) to the probabilities of large deviations (5.42), (5.48) 
(Exercise). 
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5.7 The reliability function for pure-state channel 

In the classical case, pure signal states correspond to degenerate distributions, and the 
Coding Theorem trivially implies the maximal value of the channel capacity log | X \. 
However, for pure quantum nonorthogonal states, the Coding Theorem remains non¬ 
trivial, although the situation is simplified, allowing us to obtain a stronger result. 
Here, we give a different proof of the direct statement of the Coding Theorem in the 
ease of pure states S x = \f x )(fx\, which does not use the notion of typicality but, 
due to a more sophisticated analysis, gives the estimate for the exponential speed of 
decay of the error probability. 

Theorem 5.23 (Bumashev and Holevo [33]). For R < C x , the following inequality 
holds 


p e {n,2 nR )<2-2- nE(R \ 


where 


E(R) = max — Rr -f max(—logTr 5^ +r ) 
0<r<l L 7t 


> 0. 


(5.55) 


The function E(R) gives a lower bound for the so called reliability function of 
the channel, characterizing the exponential speed of decay of the error probability 
p e (n, 2 nR ) as n —> oo. 

Proof Consider the subspace of the space M® n , spanned by the code vectors 
, f w (N). As shown in Section 2.3.4, we can use the Gram operator G = 
T.j\t y 1 /)) (f y < ;; 1 1 to construct the overcomplete system 


I tj) = G 2 | f w tn), j = l.-.-.JV, 


and hence the observable M with the components 

M j = \tj)(f j \, / = 1.JV. (5.56) 

which can be extended to the whole space M® n by adjoining the projector Mo onto 
the orthogonal complement to ..., ■ This will be our suboptimal decod¬ 

ing for the composite channel. Note that it differs from (5.49), in that it does not 
contain the typical projector P (while the conditionally typical projectors in the case 
of pure states are P w = \f w ){f w |). In this way, we obtain the upper bound for the 
average error probability expressed via the Gram operator of the code vectors: 
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2 N 


7 = 1 


= -(«-TVGl) 


(5.57) 


Assuming that the codewords {u>G)} are randomly chosen, independently of each 
other, with the distribution (5.47), we obtain 


By using the inequality 


EP e (W,M) < ^(N - ETrGi). 


— 2x 2 < x 2 — 3x , x > 0, 


(5.58) 


which follows from (5.51), in combination with the obvious one — 2x 2 < 0, we have 

(5.59) 


1 G 2 -3G 

—2G 2 < ( 

“ 0 


The expectation of the Gram operator is equal to 
N 

EG = E^ l^o)}(^0)1 = NE\x/r w (j))(x/r u 
7=1 


lj ! 7 ) 


ifx„\) 


X\...X n 


= n[Z*x\* x){*x\X n = NS? 


(5.60) 


Similarly 


/ N 

E(G 2 -G) = E( \1r w u))(f w m\f w ik))W wi k)\ 
y j,k =1 
N 


7 = 1 2 


E ( ^ ^ 1l^u>(7) ) ( ty w (j ) | ) (l^iiJ*** I j 

7 2 


iv(iv-i)(sr) 2 , 


(5.61) 
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so that in combination with (5.59) we obtain 


-2EG2 < 


N(N-1)(ST) 2 -2NST 

0 


(5.62) 


Let {|e/}} be the orthonormal basis of eigenvectors and {A y} the corresponding eigen- 
— 

values of the operator S n .In this case, 


-2(ej\EG- X 2\ej) < 


N(N - 1)\ 2 j-2N\j 
0 


(5.63) 


hence, using the inequalities min (a,b) < a r b ] r and 2 1 r < 2 for 0 < r < 1, we 
obtain 

(VAymin[((V - l)A y , 2] - 2N\j < 2 N(N - l) r A} +r - 2N\j, (5.64) 

whence 

-2ETYG2 < -2N +2N(N - l) r Tr (5®") 1+r . 

Coming back to (5.58), we can evaluate the average error probability minimized over 
all codes as 

p e (n,N)<2N r (TvSl +r ) n 
for any 0 < r < 1. Putting N = 2 nR , we obtain 

p e (n, 2 nR ) < 2-2"( /?r+logTr ^ +r ). (5.65) 

(5.66) 

Now, we can use the freedom in the choice of the parameters r and n. Note that 

—1 -fr 

— log Tr Sjj. is a concave function of r, which can be established by showing that 
the second derivative is negative (Exercise). Also, 

( - logTY^ +r )| = -TrS n logS* = H(S n ). (5.67) 

Take for n the optimal distribution, for which H(S n ) = C x . Then, for R < C x we 
have p e (n,2 nR ) < 2 • 2~ nE( - R \ where E(R) is given by expression (5.55). □ 
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5.8 Notes and references 

1. The issue of the information capacity of quantum communication channels arose 
in the sixties in the papers of Gordon [71], Forney [60], Lebedev and Levitin [142], 
and Stratonovich [203] and goes back to even earlier, classical works of Gabor and 
Brillouin, that ask for fundamental physical limits on the rate and quality of in¬ 
formation transmission (for detailed reference, see the surveys [38], [98] and the 
books [86], [107]). These works laid a physical foundation and raised the question 
of consistent quantum information treatment of the problem. 

2. The proof of Theorem 5.9, including the criterion for the equality, based on com¬ 
parison of the convexity of the quantities in (5.16), was given in [95]. The symmetric 
logarithmic derivative of a family of quantum states and the corresponding quantum 
Fisher information were introduced by Helstrom in the context of quantum estimation 
theory [86]. 

3. A detailed comparison of the capacities Ci, C x for different channels was made in 
the paper of Hirota et al. [127], 

The Strong Converse 


lim p e (n,2 nR ) = 1 for R > C x 

n^-oo 

was obtained by Ogawa and Nagaoka [160], see also the book [78], 

Formula (5.40) follows from the work of Levitin [144], who showed that for two pure 
states and arbitrary n the maximum maxji/ 3 (n, M ) over the two-valued sharp ob¬ 
servables in the plane is attained by the observable that maximizes the average proba¬ 
bility of correct decision, and from the work of Shor [192], who proved that the two¬ 
valued sharp observables are sufficient in this case, and the maximum ma xm 3(n, M) 
is a concave symmetric function of n, implying that the optimal distribution is uni¬ 
form. 

The trine channel was introduced by Holevo [94], who proved that for the uniform 
distribution n, the maximum maxji/ 3(n, M) over all observables is strictly greater 

than the maximum over sharp observables (log ^\/3/ 0.459). It is interest¬ 

ing that the information-optimal observable in this problem differs from the maximal 
likelihood observable (see Section 2.4.2) by rotation by the angle 7 t/ 2, the latter being 
the least informative observable. A complete treatment of ma xm 3 (n, M) for an arbi¬ 
trary, symmetric family of pure states in two-dimensional Hilbert space was given by 
Sasaki, Barnett, Jozsa, Osaki and Hirota [176], who also discussed experimental im¬ 
plementation of the optimal measuring procedures. Conjecture (5.39) was confirmed 
numerically in [126], and an additional argument was given in [192]. A remarkable 
study of the “lifted trine” channel, i.e. the channel with three linearly independent, 
equiangular state vectors in 3-dimensional Hilbert space was given by Shor [192], It 
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was shown, in particular, that the information-optimal observable in this case has in 
general 6 outcomes, i.e. the maximal value admitted by Proposition 5.8 in the case 
d — 3 (although the gain, as compared to 3 outcomes, is tiny). 

4. The idea of quantum data compression, originally due to Schumacher, is one of 
the central ones in quantum information theory. Theorem 5.22 is proved by Jozsa and 
Schumacher [125], For a survey of more recent results, from different authors, on the 
much more complicated problem of data compression for sources with mixed states, 
see the book of Hayashi [78]. 

5. The proof of the coding theorem for a general c-q channel was given by Holevo [97] 
and by Schumacher and Westmoreland [177]. See Winter [223] for a different proof in 
the spirit of Cziszar and Komer’s approach [44] to the classical coding theorem. This 
was preceded by the consideration of the pure-state channel in the work of Hausladen, 
Josza, Schumacher, Westmoreland and Wootters [77], who used the technique of typ¬ 
ical subspaces discussed in Section 5.5. It should be noted that Shannon’s method 
of random coding for the channel with pure, almost orthogonal states was first ap¬ 
plied by Stratonovich and Vancian [205], who used the first term in the minimal error 
probability in the asymptotics e —>■ 1 (almost orthogonal states) to conclude that this 
method allows us to obtain the so called cut-off rate 

C = — logminTrp^ < C. (5.68) 

7t 

As shown by Burnashev and Holevo [33], the properly used method of random coding 
allows us to obtain not only the classical capacity C, but also the exponential estimate 
for the error probability, in the spirit of the classical results in Gallager [66]. Recently, 
Dalai [45] was able to extend the classical sphere packing bound of Fano, Shannon, 
Gallager and Berlecamp to the pure state c-q channel, to obtain the upper bound on 
the reliability function, which coincides, at high rates, with the lower bound E(R) 
of [33], The paper [33] also contains a formulation of the hypothesis concerning the 
reliability function for an arbitrary mixed-state channel. In this case, like in the others, 
a mixed-state problem is much more complex than its pure-state analog. 
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Chapter 6 

Quantum evolutions and channels 


6.1 Quantum evolutions 

In this section, the notion of quantum communication channel will be considered from 
a general point of view. As we have seen in Chapter 4, a classical communication 
channel is completely characterized by the transition probability matrix 

X y 


Such a channel describes an affine transformation p x —>■ p' y — p(y\x)p x of 
classical states (probability distributions) p = {p x } on the input alphabet X, into the 
classical states p' = { p ' y } on the output alphabet V. 

To have a proper quantum analog of a channel we will therefore first look for maps 
taking density operators into density operators, which respect convex combinations, 

i.e. affine maps d> : @(Jf) —>• 


d> 


Y^pj s j 


£><I>[Sy]; Pj> o, X> = !; SjZ<B(X). 
j j 


Exercise 6.1. Prove the following statement: an affine map 4> of @(Jf) into itself 
can be uniquely extended to a map of the linear space Ty(.K) with the properties 

i. $ is linear: <&[£,. Cj Sj] = J2j c/4>[S/], cj e C, Sj e Tr(X) 

ii. d> is positive: S e 5 > 0 =>• 4>[5] > 0 

iii. 4> is trace preserving: Tr 4>[5] = Tr S, S e Tt(Jf) 

Hint: Follow the proof of Theorem 2.6 to obtain the linear extension of 4> to 
Ty(-K). Properties ii., iii. then are straightforward. 

Definition 6.2. For every map $ : 1:{M) Tr(J(), satisfying the properties i.-iii., 

the dual map <!>*: 33(Jf) is defined by the formula 


Tr<D[S]X = TrS<I>*[X], i(X). 


( 6 . 1 ) 
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Exercise 6.3. Prove the following statement: the properties of the map <I> from 
Exercise 6.1 are equivalent to the following: 

i. <I>* is linear; 

ii. 4>* is positive: X > 0 =» 4>*[X] > 0; 

iii. O* is unital: $*[/] =/. 

The map <I> describes an evolution of the quantum system in terms of states (the 
Schrodinger picture), while the map <I>* does this in terms of observables (the Heisen¬ 
berg picture). Notice that positive linear maps transform Hermitian operators into 
Hermitian operators. In the case of a finite-dimensional K that we are considering, the 
linear space T(.7f) and the algebra 58(Jf) coincide with the space of all linear opera¬ 
tors in M, but we use different notations to stress that ) is the arena for the states 
and the Schrodinger picture, while 58(Jf) is that of the observables and the Heisen¬ 
berg picture. This distinction becomes quite substantial in the infinite-dimensional 
case considered in Chapters 11, 12. 

Example 6.4. Let U be a unitary operator. In this case, 4>[S] = USU* is an affine, 
one-to-one mapping of the set of quantum states <5(3f) onto itself (i.e. affine Injec¬ 
tion), describing reversible evolution. In the Heisenberg picture 4>*[A r ] = (J*XU. 

The following result, going back to the famous Wigner's Theorem, characterizes all 
reversible evolutions. 

Theorem 6.5. Let <I> be an affine bijection of the convex set &{M). In this case, 

<b[S] - USU*, or 4>[S] = US T U*, (6.2) 

where U is unitary operator and S T is the matrix transposition in some basis. Equiv¬ 
alently, <t>[S] = USU*, where U is a unitary or antiunitary operator. 

An antiunitary operator U is characterized by the properties 

i. \\un = ntf^x\ 

ii. U(J2cjfj) = J2cjUifj. 

Such an operator can always be represented in the form U = U A, where U is a uni¬ 
tary operator and A = A* is the antiunitary operator of the complex conjugation, in a 
fixed basis. The corresponding evolution of states is given by the matrix transposition 
in this basis 


S T = ASA*. 


In physics, transposition and complex conjugation are associated with time inversion. 
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Proof. We rely on the case of the qubit (dim M = 2), for which the proof will be 
given in Section 6.8. Let t^, rj/ 2 be some linearly independent vectors in M. Since an 
affine bijection of a convex set maps extreme points into extreme points, we have 

®[\fj)(fj\] = \4>j)(4>j\\ i = h2. (6-3) 

Denote by M 2 the two-dimensional subspace generated by , xj/ 2 . In this case, for 
any \f) e M 2 we have 


W)(t\ < c(|^i)(^i| + \if2)(^2\). 

On the other hand, similar to (6.3), 

4 > [|^)(tA|] = \ 4 >)( 4 >\. 


Because of the positivity of <I> 

\4>}(4>\ <c(\4>i)(4>i\ + \4> 2 )(<p2\), 

i.e. | <p) belongs to a two-dimensional subspace that contains the vectors <p\,<p 2 - By 
an additional unitary transformation, we can make this subspace coincide with M 2 . 
Hence, we can apply the aforementioned result for the qubit to the restriction of <I> 
onto the subset of states supported by the two-dimensional subspace M 2 C M to 
conclude that this restriction is given by the formula (6.2), where U may, however, 
depend on that subspace. 

The proof in the general case is obtained by performing appropriate transformations 
of the subspace M 2 to show that this dependence reduces to an inessential phase factor. 
Let {ej ; j = 1,..., d} be an orthonormal basis in M.. In this case, 


nej)(ej\\ = \hj)(hj |. 

By applying the aforementioned result for M 2 generated by the vectors ej , e^, we 
obtain that {hj} is an orthonormal basis. Moreover, 

®[\ej)(ek\\ = Zjk\hj)(h k \ or z kj \h k ){hj\, (6.4) 

where \z jk \ = 1. If 4>[|ei)(e 2 |] = z 12 \hi)(h 2 \ but 0[|<?i)(<? 3 |] = z 31 |/i 3 )(/ii| then 

$[ki)(e 2 + «3|] = zi 2 \hi){h 2 \ +z 3 i\h 3 ){hi\ 

is an operator of rank 2, which contradicts (6.4). Repeating this argument, we obtain 
that in relation (6.4) one and the same alternative takes place for all j, k. 

Consider the first alternative, when 


®[\ej)(ek\] = Zjk\hj){h k \. 
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Now, putting \j/ = e\ + ■ ■ ■ + e^, we obtain that the operator 

d 

4 > [|^)(tA|] = 52 Zjk\hj){h k \ 
j,k= 1 

has rank 1, hence zjk — aja^, where \aj\ = 1 for all j. Denoting by U the unitary 
operator for which U\ej) = aj\hj) we obtain <f>[5] = USU*. Similarly, in the case 
of the second alternative we obtain 4>[S] = US T U*, where T is a transposition in 
the basis {ej}. n 


6.2 Completely positive maps 

Generalizing the previous considerations, we shall consider a linear map <I>, acting 
from to T:(Mg), where Ma , Mg are, in general, different spaces. In this 

case, the dual map <I>* acts from ‘'B(Mg) to “'B(Ma)- 

Definition 6.6. The map <I>* (#) is called completely positive (CP) if one of the two 
equivalent conditions hold: 

i. 4>* <g> Id n is positive for all n = 1,2,..,, where Id„ denotes the identity map of 
the algebra of all n x n-matrices 93(C"). 

ii. For arbitrary finite sets of vectors [<pj } C Mg, {^j} C Ma the following inequal¬ 
ity holds: 


J2{fj\®[\fj)(fk\]<pk) > o. 

j,k 


To see the equivalence of the above conditions, let us introduce the spaces M^ n \ 
which are direct orthogonal sums of n copies of Mi ; i = A, B, where n is the size 
of given sets of vectors {tpj}, {’/') }. As we have seen in Section 3.1.1, M^ can be 
identified with Mi (g> C". In this case, denoting 


we have 


n n 

c p ^ = 52 = X 2 ’ 

j= i j= i 


E<^l°[ \fj)(fk\]<Pk) = ® ld n )[\/ n) ){cp w \]\^}, 

j,k 


from which the implication i. =>• ii. follows. Conversely, any positive operator 
in M { B n) ~ Mg <8> C” can be represented as a sum of positive rank one operators of 
the form \<p^)(<p^\. Hence, condition ii., which means positivity of <I>* (g) Id„ on 
such operators, implies i. 
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Exercise 6.7. Prove the following statement: if the map <I>* satisfies the condi¬ 
tion i. for n = 2, the Kadison inequality holds 

$*[AT$*[X] < ||$*[/]||$*[X*X], (6.5) 

for arbitrary operator X. 


Exercise 6.8. Show that for the transposition map S —> S T in some basis, con¬ 
dition i. already breaks for n = 2. Hint: consider condition ii. where {x/tj} is the 
orthonormal basis in which the transposition is made. 

The notion of complete positivity was introduced by Stinespring [202] who proved 
an important result, generalizing Naimark’s Theorem. We shall give the proof of 
Stinespring’s Dilation Theorem in the special finite-dimensional case. 


Theorem 6.9. For every completely positive map <£>*: < ’H(Mb ) —> (B(Ma) there exist 
a Hilbert space Me and an operator V : M A —> Mb <8> Me such that 

4>*[X] = v*{X <g> I E )V, X e^S(M B ). (6.6) 

Dually, 

4>[S] = Tr £ VSV*, S<E"t(M A ). (6.7) 

The map <£>* is unital, i.e. < f>*[/ J g] = I A (the map <I> preserves the trace) if and only if 
the operator V is isometric. 

In relation (6.7) and in what follows we use the abbreviated notation for the partial 
trace: Tr e = Tr jf E etc - 


Proof. Consider the algebraic tensor product XL = M A <8> IS (Mb), generated by the 
elements 'I' = if (g> X, f e M A , X e 23 (Mb)- Let us introduce a pre-inner product 
in d£ with the corresponding square of the norm 


Y ® X J 


2 


Y^j\^ x J x kMk)- 

j,k 


This quantity is nonnegative because the block operator X^ = [X*X^jj * n 

Mg^ is positive and the map <I> is CP. Taking the quotient with respect to the subspace 
dfo of zero norm, we obtain the Hilbert space IK = X/Xq. Define V and it by the 
relations Vxf = i/ng> / andjrfF]'!' = it[Y](xjf^iX) = xftgiYX. It is easy to check that 
these definitions agree with taking the quotient. Then it is a *-homomorphism of the 
algebra 23 (Mb) into 23(JC), i.e. a linear map that preserves the algebraic operations 
and the involution: 


it[XY] = it[X]it[Y],it[X*] = it[X}*. 



108 


III Channels and entropies 


The map n is unital. Moreover, 

(p| <&*[*] M = (<P <8> I\f ®X) = (<p\V*jr[X]V\f), X e K(M B ), 
that is 

4>*[^] = V*Jt[X]V. (6.8) 

However, by Lemma 6.10 proved below any *-homomorphism of the algebra 
^B(Mb) is unitary equivalent to the ampliation n[X] — X (g> Ie , where Ie is the 
unit operator in a Hilbert space Me, i.e. we can take JC = Mb ® Me, and the Stine- 
spring representation (6.8) takes the form (6.6). 

The dual representation (6.7) follows from (6.6) and from the definition of the dual 
map (6.1). □ 

In this construction, the dimensionality of the space Me satisfies 

dim Me < dAcls, (6.9) 

where = dim MA,ds = dim Mb- Indeed, dim Me = dim Kids < dim £/ds 
and dim £ — d^dg. However, there is no a priori restriction and any map of the 
form (6.6) is completely positive as concatenation of ampliation X -> X (g> / and 
conjugation X —> V*XV, the complete positivity of which follows from Defini¬ 
tion 6.6. 

Lemma 6.10. Let n be a unital *-homomorphism of the algebra 93(Jf) into the al¬ 
gebra 93 (JC). In this case, there exist Hilbert space Me and isometric map U of the 
space K onto M <8> Me such that 

n[X] = U*(X®I E )U, X e 93(JC). (6.10) 

Proof Choose an orthonormal basis {ej ; j = 1,..., d} in M, and consider the matrix 
units | ej) (e k | and their images 

Vjk = n[\ej)(e k \]e®(M). 

Since it is *-homomorphism, the operators Vj k satisfy the same algebraic relations as 
the matrix units 

VjkVlm = hlV jm , V* k = V kj . (6.11) 

Consider the subspace Me = V\ \M c ,K and define U : JC -> M (g> Me by the 
relation 

uf = J2\ e j) ® v '}^- 

j 

(Note that Vijifr = V\ \ V\j f e Me)- In this case, 

(Uf!\(X ^ I E )Uf 2 ) = 

j,k 


( 6 . 12 ) 
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By (6.11) 


{Vijti\v lk f 2 ) = {ti\v jk f 2 ) = (Vo \ 7t [\ e j)i e k\]i f 2 ), 

Therefore, the right hand side of (6.12) is equal to 

^2(ej\Xe k )(fi\7l[\ej){e k \]f 2 ) = {fl\n[X]f 2 }- 

j,k 

This implies (6.10). Taking X = I and using the unitality of n, we obtain that U is 
an isometric map of -K into M <S> M E ■ 

I Exercise 6.11. Show that the image of -K under the map U coincides with 
M ® Me ■ n 

The Stinespring representation (6.6) for a given CP map is not unique. Represen¬ 
tations for which dim Me is minimal are called minimal. In this case, the number 
dim Me is called the rank of the map 3>. 

Theorem 6.12. Let 


$>*[X] = V*{X ®I E )V, X e<8{Ms) (6.13) 

be another Stinespring representation, where Ie is the unit operator in Me ■ Then 
there exists a partial isometry We from Me to Me, such that 

( I B ®W E )V = V ; (6.14) 

if both representations are minimal, then We isometrically maps Me onto M E - 

Proof For the representation (6.6), consider the subspace 

M = {(* <8> I E )Vf :feM A ,Xe ^&(M B )} C X = M B ®M E . 

It is invariant under multiplication by operators of the form Y <E> Ie ■ Hence, it has the 
form M = M b ® Me, where Me Q Me- For a minimal representation we should 
have Me = Me , because otherwise there would be a proper sub-representation. 

Consider a similar subspace M — M B ® -Me of the space -K = M B <S> Me for the 
second representation (6.13). Define the operator W from M to M by the relation 

W(X®I E )Vt = (X®i E )Vt. (6.15) 

In this case, W is isometric, since the norms of the vector (X ® 1 E)Vf and of its 
image under W are both equal to {xf |3>[2f*2f]|i/r) by (6.6), (6.13). 

From (6.15) we obtain, for all Y e 58 (M B ), 

W{YX ® I E )Vf = (Y ® I E )W(X ® I E )Vf 
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and hence 

W(Y ® I E ) = (Y ® i E )W (6-16) 

on M. Extend W to the whole of X by setting it equal to zero on the orthogonal 
complement to M. In this case (6.16) holds on X. Therefore, W = I B ® W B , 
where We isometrically maps Me onto Me- Relation (6.15) implies (6.14). If 
the first representation is minimal. We is an isometry of Xe into Me, and if both 
representations are minimal, We is unitary onto Me • □ 

Corollary 6.13. A map 3>* is completely positive if and only if it can be represented 
in the form 

d 

$*[*] = J2 V£XV k , 

k—\ 

where V k : Xa —> X B , or dually 

d 

= E VfcSK**, 

k= 1 

The map is unital ($ is trace preserving) if and only if 

d 

E V k V k = L (6-19) 

k= 1 

This representation is often called the Kraus representation. 

Proof By writing Ie = J2k=\ where {e^ } is an orthonormal basis in Xe 

and d = dim Xe, consider the operators V k acting from Xa to Xb, defined by 

(<P\V k \f) = (<P®e° k \V\f), feX B ,feX A . (6.20) 

For the operators defined like in (6.20) we shall sometimes use the notation V k = 
{e^\V. The representation (6.17) then follows from the formula (6.6). □ 

Given a representation (6.17) or (6.18), we can always restore the Stinespring rep¬ 
resentation by letting Xe = C d and 

d 

V \f) = E V k\f) ® I e °>- 

k= 1 


X e i&(X B ), (6.17) 


5 e TT (X A ). (6.18) 


Not surprisingly, the Kraus representation of a given completely positive map is not 
unique. Representations with a minimal number of components come from the mini¬ 
mal Stinespring representations and are also called minimal. 
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Exercise 6.14. Prove the following statement: the Kraus representation is mini¬ 
mal if and only if the operators {14} are linearly independent. Hint: from (6.20) 


(4>\x 



( 4 >® 


Y,c k el\\{X ® l E )V\f), 


k= 1 


for all (p e Mb, € Ma, X e 23 {Mb)- Hence, J2k=i c kVk =0 is equivalent to 

E5t=i c k e\ -L M e . 

Exercise 6.15. Basing yourself on Theorem 6.12, show that for any two Kraus 
representations of the same completely positive map, with the operators Vj, 14, 
there exists a rectangular, partially isometric matrix [u k j], such that V k = 
Ey u kj Vj- If both representations are minimal, the matrix is unitary. 

Exercise 6.16. Let M = {M x } be an observable. Show that if 

M x = V*E X V = V*E X V (6.21) 

are two Naimark’s dilations of M in K, K, then there exists a partial isometry 
W : M -»• M, such that 


WV = V, WE X = E X W. 

If the dilations are minimal in the sense of the dimensionality of K, K, then W 
is isometry of K onto K. 

The set of all CP maps that satisfy the normalization condition (6.19) is apparently 
convex. Extreme points of this set are given by 

Proposition 6.17 (Choi [41]). The unital CP map <5* is extreme if and only if it has 
a representation (6.17) where the system { V* 14} is linearly independent. 

Proof. Let 3>* be extreme and let (6.17) be its minimal representation. Let 
E/£ y.jk V* 14 = 0 and denote Y = Jf jk yjk\ e j)( e l I • Without loss of generality, 
we can assume that Y = Y* and ||F|| < 1. Define 

*±M = V*{X ® (I B ± Y))V = J2 (Sjk ± yjk) v*xv k . 

jk 

In this case, are completely positive maps and 3 >±[/ b ] = I a by the assumption. 
Since 3>* is extreme, one has 3>* = implying V*(X ® Y)V = 0 for all X, Y. It 
follows that 


[f 2 \V*{Xf ® I E )(I B ® Y)[Xi ® I E )V |^i> = 0. 
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Hence, the bilinear form of the operator Is <S> Y vanishes on M and hence Y = 0, 
by the minimality of the representation. Thus, yj^ = 0, i.e. the system { V* 14} is 
linearly independent. 

Conversely, let the system { V*Vk} be linearly independent. Let P* = 
I (p* + cp*}, where P* are unital CP maps. In this case, the Kraus operators for 
P*, P£ are also Kraus operators for some representation of P*, and hence are lin¬ 
early expressible through {14}, by Exercise 6.15, so that P*[X] = Y.jkZ]kV;xv k . 
Since P*[/g] = I a, we have 

E^/^ = E«*^- 

jk jk 

Hence, zj^ = Sjk by linear independence. Therefore, P* = P* and thus P* is 
extreme. □ 

By Exercise 1.5, the dimensionality of 23 (3(a) is d%. Hence, the rank d of an ex¬ 
treme map should satisfy d 2 < d%, i.e. d < dA as compared to d < dAds (see (6.9)) 
for an arbitrary map. 

6.3 Definition of the channel 

Let us provide a physical interpretation of the property of complete positivity, via 
the picture of the (irreversible) evolution of an open system interacting with the en¬ 
vironment. Let M be the Hilbert space of the system, Me the Hilbert space of its 
environment, and Se the initial state of the environment. Assume that the system 
interacts with the environment via a unitary operator U. The evolution of the system 
is then given by the formula 

P[S] = Tr E U (S ® S E )U*. (6.22) 

Theorem 6.18. Every linear trace that preserves a completely positive map, with 
Ma = Mb = M, can be extended to the evolution of an open system interacting with 
an environment so that relation (6.22) holds. 

Proof. Consider the space Me of d -dimensional vectors, where d is the number of 
components in the Kraus representation of P. In this case, M <S> Me can be considered 
as the direct sum of d copies of M, consisting of column vectors [xfi, ..., 
i fj e M, while operators in this space are the block matrices [Yjk]j ^ =1 Yjk e 

23 (M). Consider the vector |i/f£) = [1,0,..., 0 T . In this case, 

T S ... O' 

0 ... 0 


S ® \f E )(fE\ = 
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Introduce the operator 


Vi 


U = 


lV 3 


(6.23) 


where the first column consists of V\,..., Ly and the rest of the columns are chosen 
to complement it to a unitary operator, which is possible due to (6.19). Now, 


U(S® \f E )(f E \) u* = [VjSV k *] Jk=h J . 

The partial trace in M <g X E , with respect to M E , is just the sum of the dia¬ 
gonal elements of this matrix, which coincides with the Kraus representation for the 
map d>. □ 


The preceding discussion motivates the following definition. We denote by A the 
input system of the channel and by B the output system. 


Definition 6.19. A channel in the Schrodinger picture is a linear, completely positive 
trace-preserving map d> : Tr(Jfx) -> Tr(Jfg). Dually, a channel in the Heisenberg 
picture is a linear, completely positive unital map <t>* : 93(J(g) -> 93 {Ma)- 


The channel d> is called bistochastic if it maps the chaotic state in Ma into the 
chaotic state in Mb, ^[IaMa] = I E /d E - If the input and output systems have equal 
dimensionalities (dA = dg), the bistochastic channel d> is unital. In this case, <h* is 
also trace-preserving, so that both $ and 3>* are channels in both the Schrodinger and 
Heisenberg pictures. 

Let S — Sa be an input state of a channel d>. Denote by X = supp S the support 
of the density operator S. Consider a purification Sar = \^ar)(^arI of the state 
Sa, where R labels the purifying system and 

War) = ^2 ^j\ e j) ® I hj); Ay > 0, ey e £, 
j 

cf. (3.9). Let Id/j be the identity map in T^(Mr). In this case, the tensor product 
d> (g) Id/j describes a channel from AR to BR, transforming A into B and leaving R 
intact. Therefore, R is called the reference system. The output state of this channel is 

(<5® Id r)[S A r] = S B r. (6.24) 

Since the state Sar is, in general, entangled, this operation is sometimes called “en¬ 
tanglement transmission”. Notice that by the trace preservation property of 3>, the 
state Sbr has the same partial state Sr as the input state Sar- Since Sr is a copy 
of the state Sa, the state Sbr can be considered a quantum substitute of the joint 
distribution of the input A and the output B of the channel d>. 
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Proposition 6.20. The state Sbr uniquely determines the restriction 3>£ of the chan¬ 
nel $ to the states S with supp S C In particular, if Sa is nondegenerate, Sbr 
uniquely determines the channel <5. 

Proof. One has 

(3) ® Idtf)[£4tf] = £ yfijVlk*[\ej){ek\] ® \hj)(h k \. (6.25) 

j,k 

The collection of operators <D[|e/ ){e k \\ uniquely determines Hence, the state¬ 
ment follows. □ 

In particular, taking the maximally entangled state 

Sar = ^£l e ./)< e kl ® \hj)(h k \, 

j,k 

we obtain from (6.25), for arbitrary state S, 

<S>[S] = dTr R S B R(lB®sZ), (6.26) 

where = J2j k( e k\S\ e j)\hj)(h k \ and the state Sbr satisfies the characteristic 

condition TtbSbr = Sr — iR/d. This one-to-one correspondence between chan¬ 
nels and states is called the Choi-Jamiolkowski correspondence. 

6.4 Entanglement-breaking and PPT channels 

In Quantum Information Theory one often has to deal with both quantum and classical 
information. A usual device is to embed the classical system into a quantum system, 
as in Section 2.1.1, by representing classical states and observables on the phase space 
Q as diagonal operators in the artificial Hilbert space M spanned by the orthonormal 
basis {|a >);co e £2}. 

i. Any classical-quantum (c-q) channel x —>■ S x describing an encoding of the 
classical input x into the quantum state S x , as considered in Chapter 5, can then 
be extended to the quantum channel, 

<*>[$] = S€<3(J(a), 

X 

where {e x } is an orthonormal basis in Ma- 

ii. The quantum-classical (q-c) channel corresponding to the measurement of ob¬ 
servable M = {M y } in Ma (see Section 6.5 below), which produces the proba¬ 
bility distribution from a quantum state, gives rise to the quantum channel 

Q[S] = £|e 3 ,)(e 3 ,|Tr5M } ,, Se@(^), 

y 
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where {e y } is an orthonormal basis in Mb- The dual channel acts as 

<$>*[X] = Y J (ey\X\ey)M y . (6.27) 

y 

The Stinespring representation (6.6) in this case provides Naimark’s dilation for 
the observable M. By taking X — \e y )(e y \, we obtain 

My = V*(\e y ){e y \ ®I E )V, 

where E = {\e y ){e y \ <S> Ie} is the sharp observable. 

iii. Taking a composition of q-c and c-q channels with the same orthonormal basis 
{ej}, we obtain the q-c-q channel 

$[5] = J2SjTr SMj, (6.28) 

j 

of which i. and ii. are particular cases. Such channels have a classical system 
between the input and the output (represented by symbol j). 

Definition 6.21. Channel <t> from A to B is entanglement-breaking if for an arbitrary 
input state Sar of the channel <t> ® Id, its output state Sbr is separable (Defini¬ 
tion 3.13), i.e. a mixture of product states 

S B R = Y,Pj S B® S i ( 6 - 29 ) 

j 

where {pj } is a probability distribution. 

The following characterization is due to M. Horodecki, Shor and Ruskai [121]. 

Proposition 6.22. The following properties are equivalent: 

i. channel $ is entanglement-breaking 

ii. the state Sbr in the Choi—Jamiolkowski representation (6.26) is separable 

iii. $ is q-c-q channel (6.28) 

Proof. 

i. =» ii. is trivial. 

ii. =» iii. Taking for Sar the maximally entangled state, and using relations (6.29) 

and (6.26), we obtain the representation (6.28), in which Sj = S J B , Mj = 

d Pj( S R) T ■ 
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iii. =» i. For a channel (6.28) and any input state Sar of the channel $ ® Id/?, 
relation (6.29) holds, with S J B = Sj, PjS J R = TtaSar(Mj ® Ir). □ 

Example 6.23. An important extreme case of a c-q channel is the completely depo¬ 
larizing channel, describing the irreversible evolution to some final state Sf. 

<&[S] = S f -TrS, S e ir(Jf). (6.30) 

I Exercise 6.24. Prove complete positivity and find a Kraus representation in the 
examples above. 

Definition 6.25. Channel $ from A to B is PPT if, for an arbitrary input state Sar of 
the channel <t> <g> Id, its output state Srr has a positive partial transpose in the space 
Xr (see the end of Section 3.1.3). 

Apparently, every entanglement-breaking channel is PPT (but not vice-versa, see 
Horodeckis’ paper [118]; a PPT channel which is not entanglement-breaking is called 
entanglement-binding ). If the channel $ is given by the Kraus decomposition (6.18), 
the channel 3> T is defined by the relation 

* T m = E^* T ^ T 

k 

assuming some bases are fixed in the input and output spaces of the channel. Taking 
into account that = Sar, we obtain from (6.24), for the transposition in the basis 
{ej <8> h k }, 

($ T ® ld R )[S AR ] = S], R . 

It follows that the channels $ and 3> T are simultaneously PPT or not PT. 

I Exercise 6.26. Prove the relation 


I S> o T a = T b o <D t , (6.31) 

where Ta,Tr denote transpositions in the corresponding spaces. 

Proposition 6.27. The following conditions are equivalent: 

i. The channel $ is PPT. 

ii. The state S R r in the Choi-Jamiolkowski representation (6.26) has a positive par¬ 
tial transpose. 

iii. $ o Ta or Tr ° $ is a channel. 
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Proof. Again i. => ii. is trivial. 

ii. =» iii. Taking for Sar the maximally entangled state and using the relation (6.26), 

we obtain 

(<*> o 7^)[5] = dTr R S B R(I B ® Sr) = dTr R S^(I B ® Sj), 

which is a channel, because Sg£ > 0. By relation (6.31) and the remark 
preceding it, this is equivalent to 7g o <p being a channel. 

iii. =» i. Let 7a o cp be a channel. In this case, for any input state Sar of the channel 

$ <g Ida we have 

(3> ® Ida[S,4/?]) T * = ( Tr ° ® Ida)[5 / ia] T >0. □ 

6.5 Quantum measurement processes 

An important example of irreversible evolution is the state change of the system due to 
a quantum measurement. A complete ideal quantum measurement is associated with 
an orthonormal basis \e x ), indexed by the measurement outcomes x. It is postulated 
that the initial state S of the system after such a measurement is transformed into 
one of the states \e x )(e x \ with probability (e x \S\e x ). Thus, the post-measurement 
statistical ensemble splits into subensembles corresponding to different measurement 
outcomes x, and, as a whole, is described by the density operator 

S'= Y.\e X )(e x \S\e x )(e x \. (6.32) 

* 

Notice that the map S S' is a particular case of a q-c channel corresponding to the 
measurement of an observable M = {\e x )(e x \). 

Under an incomplete ideal measurement some outcomes are joined together 
(coarse-grained), forming the orthogonal resolution of the identity E = {E x ), 

E X E X ,=8 XX ,E X , £>* = /, 

* 

in other words, a sharp observable in the space M. According to the von Neumann- 
Liiders projection postulate , the ideal measurement of the sharp observable E gives 
an outcome x with the probability 

Px = TrSE x = TrE x SE x , 

and the posterior state, i.e. the post-measurement state of the subensemble in which 
the outcome x has occurred is 

e x se x 

x Tr E X SE X ’ 


if p x > 0. 
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We then have the quantum analog of the Bayes formula for the state of the whole 
ensemble 

s'= PxS x = E X SE X = g[S]. (6.33) 

X X 

Note that, as distinct from the classical Bayes formula, the quantum state (6.33) of 
the whole ensemble after the ideal measurement can be different from the initial state 
S. Thus, even an ideal quantum measurement does not reduce to a simple reading 
of the outcome and involves an inevitable interaction, which changes the state of the 
system. This is a fundamental difference between quantum observables and classical 
random variables, the observation of which does not change the statistical ensemble 
and reduces to mere selection of its representatives according to the values of a random 
variable. 

In the Heisenberg picture, the transformation of observables due to an ideal mea¬ 
surement is described by the completely positive map 

8*[X] = J2 E x XE x = 8[X], (6.34) 

* 

which can be interpreted as a conditional expectation onto the algebra of operators 
commuting with all of E x , characterized by the properties 

i. 8 2 = 8 

ii. g[Xg[F]] = g[X]g[F], X, Y e 5B(Jf). 

An ideal quantum measurement satisfies the repeatability hypothesis: the outcome 
of the repeated measurement is equal, with probability one, to the outcome of the first 
measurement (provided no evolution occurred between the two measurements). Most 
of the real measurement procedures do not fulfill this requirement, and we pass to a 
mathematical description of non-ideal measurements. 

Consider the process of indirect measurement, when the system M in the state 
S first interacts with a probe system Me in the initial state Se, and next an ideal 
measurement of a sharp observable { E is made over the probe system. In this case, 
according to the above, the probability of an outcome x equals 

Px = Tr U(S®S E )U*(I ®E%), 

and the posterior state is described by the density operator S x , such that 

p x S x = Tr E U(S ® S E )U*(I ® £°) s <&,[$]. (6.35) 

Thus, the state of the system after the measurement can be written as 

$[S] = 

X X 
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The maps <t> x are completely positive, and their sum <h is trace preserving. Any 
such family {d> x } is called an instrument. The instruments describe the statistics 
and posterior states of non-ideal measurements, which in general need not satisfy the 
repeatability hypothesis. 

Note that 


Px — Tr SM X , where M x = <b* [/] 

is the resolution of the identity describing the observable associated with the instru¬ 
ment and the indirect measurement. Conversely, given an observable M = {M x }, 
one can construct a (non-unique) instrument and indirect measurement with which it 
is associated. Consider a decomposition M x = V* V x and the corresponding unitary 
operator (6.23) in the space X <8> X E ■ In this case, the family of completely positive 
maps 

4>*[S] = Tr E U(S ® \f E )(f E \)U*(I ® \e x )[e x \), (6.36) 

where \e x ) is the column vector with 1 in the v-th place and zeroes otherwise, so that 
\yjr E ) = |ei), forms an instrument. Further, 

M x = Tr E (/ ® \f E )(fE\) U* (/ ® \e x )(e x \) U = <D*[/]. (6.37) 

Thus, the process of indirect measurement of the observable M is realized by the 
probe system X E in the initial state We)We\, the unitary operator U in the space 
X <g> Xe, and the sharp observable {\e x ){e x \} in X E ■ 

Exercise 6.28. For a complete ideal measurement with M x = \e x )(e x \, set V x — 
M x and construct the unitary operator U by use of formula (6.23) (in this case 
Xe = X). Show that 


| U(\e x ) ® |ei)) = \e x ) ® \e x ). 

For an arbitrary vector x// e X 

U(W) ® | ei )) = ^[e x \f){\e x ) ® \e x )), 

X 

thus, an interaction U produces entanglement between the main and the probe sys¬ 
tems, which allows us to reduce the ideal measurement in the main system to the 
measurement of the sharp observable {\e x ) (e x |} in the probe system. 

6.6 Complementary channels 

Given three quantum systems A, B, C with the spaces Xa, Xb, Xc and a linear op¬ 
erator V : Xa ->• Xb ® Xc, the relations 

d> B [S] = Tr C VSV*, <D C [S] = Tr B VSV*- SeT.? (X A ) 


(6.38) 
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define two CP maps <I>£ : T {Ma) -*■ T {M B ), d>c : T {3(a) -> T (3(c), which 
will be called mutually complementary. If V is an isometry, both maps are trace 
preserving i.e. channels. 



Figure 6.1. Complementary channel. 

The Stinespring dilation theorem implies that for a given CP map (channel) a com¬ 
plementary always exists. Moreover, the complementary map is unique in the follow¬ 
ing sense: for a given CP map d>B, any two maps <t>c, &c' complementary to <J># are 
isometrically equivalent in the sense that there is a partial isometry W : Me ->• Me 
such that 

<p c ,(S) = W<S> C {S)W*, d>c(5) = W*$c'(SW, (6.39) 

for all S. This is a consequence of Theorem 6.12. If the dimensionalities of Me, Me 
are minimal, W is an isometry from Me onto Me- In this case, the complementary 
map is called minimal. 

If A = B is an open quantum system that interacts with the environment C — E 
in an arbitrary initial state Se, and d>B is the corresponding channel 

$ B [S] = Tr E U{S ®S E )U*, (6.40) 

describing the state change of A, the final state of the environment is the output of the 
channel 

®e[S] = Tt b U(S ®S e )U*. (6.41) 

We will call the channels d>B, mutually weakly complementary. If the state of the 
environment is pure, S E = \ty E )(if E \, then introducing the isometry V — U\\j/ E ) : 
Ma -*■ M b ® M e , which acts as follows 

V\f) = UQf) ® \f E )), f e M a , 

we see that d>£ is the complementary of <I>£. If S E is not pure, then by taking its 
purification S E ' in the space M E ' = M E ® Mr and letting U' = U ® Ir, we obtain 
a representation of the type (6.41), with E replaced by E', where S E > is a pure state. 
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Hence, <£>£■/ is the complementary of <1>£ and Og [5] = Tr In contrast to 

the complementary map, the weak complementary is not unique and depends on the 
representation (6.40). 

To simplify the formulas we shall also use the notation O for the map that is com¬ 
plementary to O. 

Assume that a CP map O : ^(Xa) -* T^(Xb) is given by the Kraus representation 

d 

*[S] = £ V k SV*. 

k=l 

In this case, a complementary map <J> : T^(X) -> 311 j is given by 


d 

$[S] = [TrV k SV l *] ki= - n = £ {TxSV;V k )\e k )( ei \, (6.42) 

jfc,/=i 

where { e%} is the canonical basis for the coordinate space <C d , which plays the role of 
Xc , so that d = dc- This statement follows from the fact that 

d 

v = Y J ® v k 

k= 1 


is a map from Xa to J2k= l ®^B — Xb <8> Xc, for which <b, are given by the 
partial traces (6.38), see Exercise 3.6. 

Exercise 6.29. Check by direct computation that, applying the same procedure 
to <1>, one obtains the map <I>, which is isometrically equivalent in the sense 
of (6.39) to <b. 

Exercise 6.30. Prove the following statement: channel d> is extreme if and only 
if there exists a complementary channel <J> such that Ker <J>* = 0. Hint: prove 
the relation <1>*[T] = J2jk YjkVfVk, where = {e ( j\Y\e®) and use Proposi¬ 
tion 6.17. 

Example 6.31. Consider the channel 

9(S) = S ®S C , (6.43) 

where Sc is a fixed state in the space Xc ■ By writing the spectral decomposition 

d 

Sc = Y2 ^k\ e k) ( e k l> 

k=l 
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one arrives at the Kraus representation with the operators Vg = yfkg(I <8> \eg)) : 
Xa -*■ Xa <8> X c ■ Hence, one obtains from (6.42) 

<J>(S) = \^khl^S\ kl=T j ~ Sc TV 5, 

which is a completely depolarizing channel. If Sc is pure, (6.43) is essentially the 
same as the ideal (identity) channel Id and <J> is its complementary. Whatever the 
state Sc , the ideal channel Id is weakly complementary for this channel. 

This example illustrates the fact that perfect transmission of quantum information 
from input A to output B is equivalent to no information going from A to the envi¬ 
ronment C, and vice versa. Later in this book, we shall see that approximate version 
of this complementarity plays a basic role in quantum capacity theorems: the closer 
channel d> is to an ideal one, the closer its complement <I> is to the completely depo¬ 
larizing one. 

The completely depolarizing channel is an instance of entanglement-breaking (q- 
c-q) channel (see Section 6.4); the construction of the complementary channel can be 
generalized to the whole of this class. For this, we need the following characterization 
of entanglement-breaking channels. 

Proposition 6.32. Channel <t> : T^(Ma) —>• T^(Xb) is entanglement-breaking if and 
only if it has a Kraus representation with rank one operators Vg: 

d 

$[5] = ^2 Wk){tk\S\tk){fkl (6.44) 

k — 1 


where 

d 

J2 \fk){<Pk\<Pk)(fk\ = I- 
k=i 

Proof. Let the channel <t> admit the representation (6.44). In this case, making a renor¬ 
malization, we can always assume that {<pg\<pg) = L in which case {|V^/t)} > s an over¬ 
complete system. Denoting Sg = \<pg)(<pg \ and Mg = \fg){fg\, we rewrite (6.44) 
in the form (6.28). Hence, <t> is entanglement-breaking. 

Conversely, by making the spectral decompositions of the operators Sj,Mj in (6.28), 
we obtain a Kraus representation of the form (6.44). □ 

The complementary channel <J> : T:(Ma) ^(C^) has the form 

d 

$[5] = [cgiiirg\S\xjri)] kl= Yf = c kl\ e k)(fk\S\fi){ei\, (6.45) 

k,l=i 
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where c^i = (cpi\<Pk)- In the special case where the system > s an orthonor¬ 

mal basis in Xa, (6.45) is a channel called dephasing. The reason for this name is 
that such channels describe the suppression of the off-diagonal elements of the den¬ 
sity matrix, due to the fact that \c^i\ < 1. In particular, the completely dephasing 
channel corresponds to c^i = &kl ■ From (6.44) we see that the dephasing channels 
are complementary to c-q channels. Note that using the Schur (element-wise) product 
of matrices *, formula (6.45) can in this case be written in the compact form 

<J>[S] = C * S, 

where C = [c^i] and S is the density matrix in the orthonormal basis {i/Tt). 

For q-c channels, { < Pk) k _~^J is an orthonormal basis in X, so that c^i = 8kh ar| d 
the complementary channel is again a q-c channel. 

Noting that an arbitrary positive definite matrix [c^i] can be represented as c^i = 
J2j vijVkj, and denoting 

d 

Vj = J2 v kj\ e k)(fk\, (6-46) 

k = l 

we have the Kraus representation 

dB 

&[S] = '%2VjSVj* (6.47) 

j =i 

for the complementary map. For the dephasing channel, we can take | e^) = \^k)- 
Hence, one sees from (6.46) that the dephasing maps are characterized by the property 
of having a Kraus representation with simultaneously diagonalizable (i.e. commuting 
normal) operators Vj. 

Definition 6.33. Let {pj} be a finite probability distribution and Oy : T:(X) -»• 
l:{Xj) a collection of channels. Channel : T?(X) —> will be called 

orthogonal convex sum of the channels <J> 7 , 

<i> = y ~' i ®pj^j, 
j 

if $[S] = ®Pj*j[S] for all S € <2(X). 

Example 6.34. Quantum erasure channel. Let p e [0,1]. Now, the channel <t> : 
T:(X) -> T:(X © C), defined by the relation 

(1-P)S o 

0 pTrS 


®(5) = 
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transmits the input state S with probability 1 — p and “erases” it with probability p 
sending the erasure signal. This channel is the orthogonal convex sum of the ideal 
channel and the completely depolarizing channel (6.30), with a pure final state Sf. 

Exercise 6.35. Show that the complementary to the orthogonal convex sum of 
the channels is the orthogonal convex sum of the complementary channels for the 
components: 

j j 

Hint: write the Kraus representation for the orthogonal convex sum, basing your¬ 
self on those for the summands, and then use the relation (6.42). 

In particular, by using Example 6.31 (with d = dim Xc — 1), find that the 
complementary to the erasure channel ^> p is unitarily equivalent to the erasure 
channel Q\- p . 

6.7 Covariant channels 

Recall that a projective unitary representation of the group G is a family of unitary 
operators { V g ; g e G} in a Hilbert space, X satisfying 

V g V h = A (g,h)V gh , 

where X(g, h) is a complex multiplier (of modulus one). If A (g, h) = I, we call it a 
unitary representation (see e.g. [153] for more detail). 

Let G be a group and g ->• Vg^; g e G be two (projective) unitary representations 
of G in Xj ; j = A, B. The channel d>: T:(Xa) ->• T^(Xg) is covariant if 

<D[K/S(K/)*] = Vf<S>[S](Vf)* (6.48) 

for all g € G and S. If the representation Vg is irreducible, the channel d> is bis¬ 
tochastic. Indeed, Vg A>[Ia] — ^[/x] Vg (for all g), and irreducibility of Vg implies 
that <J>[/ 4 ] is proportional to Ig, with the coefficient obtained by taking the traces. 

Definition 6.36. The depolarizing channel in X is defined by the relation 

I d^ 

€>[S] = (1 - p)S + p— TVS, 0 < p < -p: -. (6.49) 

d d z — 1 

If p < 1, this describes a mixture of the ideal channel Id and the completely de¬ 
polarizing channel which maps an arbitrary state to the chaotic state S = ^. For the 

whole range °<P<rf — complete positivity can be demonstrated by the explicit 
Kraus representation (see (6.54) below). 
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Exercise 6.37. Show that the depolarizing channel can be characterized by the 
property of unitary covariance: <b[[/SC/*] = C/<I>[S][/* for an arbitrary unitary 
operator U in M . 

The depolarizing channel satisfies <b* = <b and is thus unital and bistochastic. 

Exercise 6.38. Show that the channel complementary to a covariant channel is 
itself covariant: 

<D [V g A S(V g A )*] = V g E $[S](V g E )*, 

where g —>• Vg is a (projective) unitary representation of the group G in Me ■ 
Hint: use the fact that an arbitrary covariant channel has a Kraus representa¬ 
tion (6.18), in which operators Vj satisfy the relations 

v g B Vj(y g A r = J2 d jk(g)v k , 

k 

where g -* D(g) — [nfyfc(g)] is a matrix unitary representation of the group 
G. It follows that the complementary map (6.42) is covariant and the role of the 
second representation Vg is played by D(g). 

For future use we introduce here an important general construction, namely a dis¬ 
crete version of the Weyl operators and canonical commutation relations (CCR) in a 
finite-dimensional Hilbert space. Fix an orthonormal basis {e^;k = . ,d) in M . 

The discrete Weyl operators are the unitary operators in M defined as 

W aj) = U a VP; a,p = 0,...,d - 1, (6.50) 

where 

/ 2.71 i k \ 

V\ e k) — ex P ( ——— I \ek)\ U \e k ) = \e k+ i( mod d))', k = 0,..., d - 1. (6.51) 

Notice that Woo = I ■ These operators satisfy the discrete analog of the Weyl-Segal 
canonical commutation relations (CCR) for bosonic systems, which will appear in 
Chapter 12: 


27ti/3oi' 

^a/3 ~ ex P ^a+a' 


--- exp ■ 


d 

2tti (Pa' — P'a ) 
~d 


W a iprW a p. 


(6.52) 


The first equality expresses the fact that (a, P) ->• W a p is a projective representation 
of the additive cyclic group 1^ x Zj. 

The representation is irreducible: any operator A that commutes with all W a p is 
a multiple of the unit operator. Indeed, [A, V] = 0 implies, by (6.51), that 
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A\ek) = ak\ e k) ar| d [A, U] = 0 implies = a. We then have an important identity: 
for all states S 

d -1 

E w «fi S Kp =d(TrS)I, (6.53) 

a,f)=0 

since it follows from (6.52) that the left side commutes with all the Weyl operators. 
The constant on the right is obtained by comparing traces. In fact, the identity (6.53) 
is a very special case of a general identity, following from orthogonality relations for 
an arbitrary irreducible representation (see e.g. Section IV.2 in [107]). 

I Exercise 6.39. Prove the Kraus representation for the depolarizing chan¬ 
nel (6.49): 


4’[s] = (i-f>^r l )s + |r E w °e s Ke- «s- 54 > 

' ' o'+ J 8>0 

where all the coefficients are nonnegative, provided 0 Hint: sub¬ 

stitute the expression for Tr S from (6.53) into (6.49). 

Proposition 6.40. The depolarizing channel is entanglement-breaking for p > ^ j. 

Proof To avoid notational confusion with the differential, in this proof we denote the 
dimensionality as d A = dim X. Let 0 be the unit sphere in X and let v(dd) be the 
uniform distribution on 0. Thus, |d) e 0 is a unit vector in X. Now, as we show 
below, 

d A [ \9){9\v(d9) = I, (6.55) 

J& 

so that we have a continuous overcomplete system in M. Then as we shall see in 
Section 11.7, the relation M(d9) = d A \9)(9\v(d9) defines an observable in Jf, with 
values in 0, in the sense of Definition 11.29. Its probability distribution in the state S 
is given by 

lis(d9) = d A [9\S\9)v(d9). 

We prove the statement by using the continuous analog of Proposition 6.32, which 
will be established later, in Section 11.7. 

It is sufficient to prove the statement for p = In this case, the depolarizing 

channel for p > can be represented as a mixture of this channel and a com¬ 

pletely depolarizing channel S -*■ ^-Tr S, which are both entanglement-breaking. 
For p = 2a + v we prove 

$[S] ^ -±— S + -f- = d A f \9)(9\S\9)(9\v(d9) 

d, 4 + 1 a. A + 1 d A J@ 


(6.56) 
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for all density operators S, which is a continuous analog of the Kraus represen¬ 
tation (6.44), meaning that the channel is entanglement-breaking. We will use 
Lemma IV.4.1 from [107], according to which, for any continuous function F and 
a fixed 1 9'), 


J"F(\(6\0')\)v(d6) = - f F(r)d(l-r 2 ) dA ~ l . (6.57) 

0 o 

It suffices to establish (6.56) for all S = \9') (9'\, 9' e 0. Consider the operator 

o — d A f \9){9\9'){9'\9){9\v(d9). (6.58) 

J@ 


It has unit trace, since 


Tr a = d A f \(9\0’}\ 2 v(d0) 
J& 


d A j r 2 d(l — r 2 ) dA 1 = 1 
o 


by (6.57). From this, (6.55) follows by polarization. 

Next, a commutes with all unitaries leaving invariant 1 6'). Hence, it must have the 
form 


cr = (l-p)\9')(9'\ + P 


d A 


To find p take {9'\o\6'). We then obtain from (6.58) 

l 

(1 - P) + T- = d A f M9')\ 4 v(d9) = -d A f r 4 d( 1 - r 2 ) dA -\ 
d A J@ J 

o 

Computing the integral with the formula (6.57) we obtain the value d ^ +l , whence 

P = A- n 

6.8 Qubit channels 

Let us consider channels in the qubit space M 2 which are affine maps of the Bloch 
ball in K 3 , satisfying an additional restriction imposed by complete positivity. 

Proposition 6.41. An arbitrary linear, positive, trace-preserving map d> of the space 
T: (M 2 ) can be represented in the form 


4 >[ 5 ] = U 2 A[U l SU*]U^, 


(6.59) 
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where U\,U 2 are unitary operators and A has the following canonical form, in the 
basis of Pauli matrices 

A [/] = / + ^2 tyOy, A \oy\ = XyOy, y=x,y,z, (6.60) 

y=x,y,z 


with Ay, t y real. 

Proof The restriction of O onto the state space of qubit is an affine map, hence it 
maps the state S(a), given by (2.8) into the state S(Ta + b ), where T is a real 3x3- 
matrix, b is a vector in K 3 . By using the polar decomposition for T and then the 
spectral decomposition for |T|, we have 

T = 0\T\ = 0 2 L0i, 

where 0,0\,0 2 are orthogonal matrices, det 0\ = det 0 2 = 1 and L— 
diag[A x , X y , X z ] is diagonal, with the A’s real but not necessarily nonnegative. Thus, 

$[S(a)] = S(0 2 [L(0\a) + ?]), (6.61) 


where t = Of 1 b. 

The matrices 0\, 0 2 describe rotations of the Bloch ball in K 3 . Let us show that 
they correspond to reversible evolutions in X, generated by unitary operators U in 
X according to formula (6.2). It is sufficient to consider rotation O by an angle <p 
around the z-axis. In this case, a unit vector a with the Euler angles 0, <p, is rotated 
to Oa, with the angles 9,<f> + <p so that the corresponding state vector (2.9) in M is 
transformed into 


I f(Oa)) = 


'cos !<r«'(*+¥>)/2' 
sin | e «'(*+v)/2 


= U\f(a)), 


where U = diag[e e‘3°/ 2 ] is a unitary matrix. Then 

S(Oa) = US(a)U* 


(6.62) 


(6.63) 


for all a. 

By using (6.61) we obtain the required statement (6.59). 


□ 


Exercise 6.42. Show that a rotation of the Bloch ball around the axis a by the 
angle <p is implemented by the unitary operator 


U = exp 



tp q> 

= cos — / — i sin —a(a). 
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Let us focus our attention on the maps of the form (6.60). Of course, complete posi¬ 
tivity imposes nontrivial restrictions on the parameters A y ,t y [174]. Especially simple 
is the case where t y = 0, when the map <6, and hence A, is unital. In this case A is a 
contraction of the Bloch ball along the axes x, y, z, with coefficients A x |, Aj,|, |A Z |, 
combined with reflections in case some of the numbers X y are negative (for example, 
X y < 0 implies reflection with respect to the plane xz). Transposition of the density 
matrix 5(2), which is not completely positive, corresponds to A with the parameters 
X x = 1, X y = — 1, A z = 1. It follows, in particular, that the relation of the type (6.63) 
also holds for orthogonal matrices O with det O = — 1, but with antiunitary opera¬ 
tor U. Since any affine bijection of the Bloch ball is apparently implemented by an 
orthogonal matrix O, this implies Theorem 6.5 in the case of qubit. 

Exercise 6.43. By using the multiplication rules (2.17), show that in the case 
t Y = 0 

A[S] = £ H y a y Sa y , (6.64) 

y=0,x,y,z 

where 


Mo — ^ (l + A x + A y + X z ), fi x — - (l + X x — X y — X z ) , 

Mv = lx f Xy X z ), Mz — 4(1 ’ 

Non-negativity of these numbers is the necessary and sufficient condition for the 
complete positivity of A and hence of <6. 

In the case 2 = 2, the “discrete Weyl operators” are nothing but the Pauli matrices, 
namely, with the convention |eo) — I t)> ki) = I I), we have 

Wot = V — CT Z , Wio = U = a x , Wu = UV = -ia y , (6.65) 

and the “discrete CCR” are the multiplication rules (2.17). 

I Exercise 6.44. Show that qubit unital channels (6.64) are covariant with respect 
to the projective unitary representation of the group Z 2 x Z 2 defined by (6.65). 


6.9 Notes and references 

1. Proof of Theorem 6.5, going back to famous Wigner’s Theorem [220], follows the 
book of Davies (see Section 2.3 and Comments in [48]). 

2. The notion of a completely positive map, introduced by Stinespring [202] in the 
more general context of C*-algebras, was thoroughly studied in the paper of Arve- 
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son [9], A useful survey of the properties of CP maps, including the proof of the in¬ 
equality from Exercise 6.7, is in the paper of Stormer (pp. 85-106 in the volume [61]). 
In the finite-dimensional case, which has its own special features, CP maps were con¬ 
sidered in detail in the work of Choi [41], In these works one can find the conditions 
for extremality of CP maps. 

This proof of Theorem 6.9 relies upon Gelfand-Naimark-Segal (GNS) construction, 
where from an object having a positivity property, one constructs a representation in 
a Hilbert space. The above proof, with a minor modification, works for CP maps of 
C*-algebras in infinite dimensions. In the case of the algebra 23(70, Lemma 6.10 
allows us to obtain the more special result (6.6). 

3. The characteristic CP property of the dynamics of an open quantum system was ob¬ 
served in particular by Lindblad [148] and Holevo [92], However, it was already im¬ 
plicit in the notion of “dynamical matrix” introduced by Sudarshan et al [206], as well 
as the representation (6.18). Equivalence of this representation with complete posi¬ 
tivity was proved by Choi [41]. Later it was independently considered in great detail 
in the book of Kraus [139]. Some authors use the name operator-sum representation 
for (6.17), (6.18). The correspondence (6.26) was introduced by Jamiolkowski [123] 
for positive maps and by Choi [41] in the case of CP maps. Many interesting applica¬ 
tions of this correspondence in the field of quantum information theory are considered 
by Verstraete and Verschelde [213], 

Extensive literature is devoted to the Markovian dynamics of an open system de¬ 
scribed by a quantum dynamical semigroup, i.e. the semigroup of normalized positive 
or completely positive maps. A survey of quantum dynamical semigroups and closely 
related quantum stochastic processes can be found in the books [48], [99]. 

4. Q-c-q channels were introduced in the paper [98]. A detailed study of entanglement¬ 
breaking channels, including proof of Proposition 6.22, is given in the paper of 
M. Horodecki, Shor, and Ruskai [121]. 

Entanglement-binding channels were introduced in Horodeckis’ paper [118]), where 
examples are given of PPT channels that are not entanglement-breaking. 

5. For detailed discussions of mathematical models for the quantum measurement 
process, see the books of Ludwig [152], Davies [48], Kraus [139], Holevo [99], the 
papers of Lindblad [146], and Ozawa [163]. 

6. The notion of complementary channel was introduced in the paper of Devetak and 
Shor [52]. The results in this section are due to Holevo [103], and to King, Matsumoto, 
Natanson, and Ruskai [132]. 

7. Discrete Weyl operators were used in the famous paper of Bennett, Brassard, Cre- 
peau, Jozsa, Peres, and Wootters [19] on quantum teleportation. Concerning the 
covariance of the complementary channel (Exercise 6.38) see [103]. The Choi- 
Jamiolkowski state of the depolarizing channel is the isotropic state (a mixture of 
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chaotic and maximally entangled states) [170], and a usual way to prove Propo¬ 
sition 6.40, as well as its converse, is to study the separability of that state (see 
e.g. [128]). 

8. Realistic examples of qubit channels are considered in detail in the book of Nielsen 
and Chuang [158]. The structure of general qubit channels is described in detail by 
Ruskai, Szarek, and E. Werner [174], see also Verstraete and Verschelde [213]. 



Chapter 7 

Quantum entropy and information quantities 


7.1 Quantum relative entropy 

The classical relative entropy was considered in Section 4.2, where its monotonicity 
property and relation to the Shannon mutual information were established. The fol¬ 
lowing definition introduces the noncommutative analog of the relative entropy, which 
plays an important role in quantum information theory. Many properties of the rela¬ 
tive entropy are generalized to the quantum case, but some proofs become much more 
involved. 

Definition 7.1. Let S, T be two density operators. If T is nondegenerate, the quantum 
relative entropy is defined by the formula 

H(S\T) = -H(S) -TV 5 log T = Tr S'(log S - log T). (7.1) 

The case where supp S c supp T (supp 5 denotes support of S, see Section 1.3) 
is reduced to the previous situation by considering the restrictions of the operators 
S, T to supp T, where T is nondegenerate. If however supp S % supp T, then, by 
definition, H(S\ T) — + 00 . 

Exercise 7.2. Let A j,p-k be the eigenvalues and |cy), \h/ c ) the respective eigen¬ 
vectors of the density operators S,T. Prove that in this case, 

H(S; T) = J2 | (ej\h k )\ 2 (A j log Ay - Ay log p k ). (7.2) 

j,k 


Proposition 7.3. 


H(S; T) > ^pTr (S — T) 2 > 0, (7.3) 

with equalities if and only if S — T. 

Proof By using the formula 

7(A) - r){p) = (A - + ^(A - 
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where £ lies between A and //, we obtain for the function (4.2) the inequality 


A log A — A log /r > log e 
Substituting this into (7.2), we get the result. 


(A - p) + i(A - p) 2 


Corollary 7.4 (Subadditivity of Quantum Entropy). Let S\ 2 be a density operator in 
,R\ <g) ,W 2 , with partial states Si = Tr j( 2 S \ 2 and S 2 = Tr S\ 2 . Then 

H(Si 2 ) < H(Si) + H(S 2 ), (7.4) 

with equality if and only if S\ 2 = S\ S 2 . 

Proof This follows from the equality 

H(Si) + H(S 2 ) - H(Si 2 ) = H{Si 2 -Si ® S 2 ) 
and the inequality (7.3). □ 

Similar to the ordinary entropy (see Exercise 5.3), the relative entropy has the fol¬ 
lowing elementary properties, to be proved in the following exercise: 

Exercise 7.5. 

i. Isometry invariance: H(VSV*; VS'V*) = H(S ; S') for an arbitrary oper¬ 
ator V isometric on the supports of S, S'\ 

ii. Additivity. H{S\ <g) S 2 ; Sj <g) Sf) = H(S\; Sj) + H(S 2 ; Sf). 


7.2 Monotonicity of the relative entropy 

The following key property is the noncommutative analog of the inequality (4.18). 

Theorem 7.6 (Lindblad [148]). For arbitrary density operators S , T in M\, and ar¬ 
bitrary channel <f>: l7(Jfi) T:{M 2 ) 

H($>[S]; <f>[7]) < H(S; T ). (7.5) 

This important result follows from Theorem 7.7, to be proved below, which es¬ 
tablishes a similar monotonicity property for a whole class of functions of a pair of 
quantum states. 

Consider the Hilbert space L 2 (M) of linear operators in M, with the inner product 

(X,Y) = Tr X*Y. 
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Let S, T be nondegenerate operators. In L 2 (M), we introduce the operator Lj of 
left multiplication by T and the operator R$ of right multiplication by S, namely, 
LjX = TX, RsX = XS. They are positive commuting Hermitian operators in the 
Hilbert space L 2 (M). 

The relative g-entropy of operators S, T is defined by the relation 

H g (S;T) = {S l l 2 ,g{L T R~ s l )S l l 2 ), (7.6) 

for any function g where § is the class of functions of the form 

(w — l) 2 

g(w) = a(w — 1) + b(w — l) 2 + / - dv(s), (7.7) 

Jo w + s 

with a real, b > 0, and v a positive measure on [0, oo) such that f£° ~j-<iu(.v) < oo 
(concerning the inner characterization of § as the class of operator-convex functions 
such that g(l) = 0 see Section 7.8). For the proof of the following theorem it is only 
important that the function g(w) admits such a representation. 

Theorem 7.7 (Petz [167]). For arbitrary g e §, the relative g-entropy H g ( S ; T) has 
the monotonicity property 


H g {<S>[S\,<S>[T]) < H g {S-,T), 


(7.8) 


where <I> is an arbitrary channel. 

Theorem 7.6 then follows from the two observations, 

i. The function g(w) = — log w belongs to the class §. Indeed, by performing 
integration by parts, one can check that 


— In w 



1 

1 + x 



(w - l ) 2 , 

(w + s)(s + l ) 2 


H-io g (S\ T) = H{S-T). 


We have 

- log (L T Rs l )X = (log R S ) X - (log L T ) X = X(\og S ) - (log T)X 

because of the commutativity of the operators Lf,Rs, whence we obtain 
— log^rT?^ 1 ) 5 1 / 2 = (log S — log T)S 1 ! 2 , and the result follows from (7.6). 
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Proof of Theorem 7.7. 


Lemma 7.8. For an arbitrary function g e §, 

H g (S;T) = bTr(T - S)S~ 1 (T - S) 

POO 

+ Tr (T — S)(L t + sRs)~ l (T — S)dv(s) 

Jo 

Proof By introducing the notation A j,s — Fj Rf' 1 , we see that 

(A T ,s ~ I)S l/2 = (T~ S)S~ 1/2 = R s -m{T - S), 


(7.9) 


(7.10) 


hence, 

H w -i(S] T) = Tr S l/2 {T - 5)5“ 1/2 = 0, (7.11) 

so that the linear term in (7.7) does not contribute to (7.9). Forg(tu) = s > 0, 

we find, by using (7.10), 

H g (S; T) = ((A t>s - ns 1 ' 2 , (A T ,s + sI)-\A t ,s ~ I)S l/2 ) 

= Tr [(7 — S)(A T ,S + sir 1 R S -i (T - 5)] 

= Tr (T-S)(L t +sR S )~\T-S). (7.12) 


For s = 0 we get 

H{w-i) 2 ,w(S;T) = Tr(T-S)T~ 1 (T-S) = H iw _ l)2 (T; S). (7.13) 

By substituting this into (7.7), we get (7.9). □ 


If we neglect the first term in (7.7) (which does not contribute to H g ( S ; T)), the 
functions (7.7) are “continual convex combinations” of the functions (w — l) 2 and 
, s > 0. Thus, it is sufficient to prove monotonicity of the relative g-entropy 
for such functions g, i.e. for (7.12). 

The proof of Theorem 7.7 now follows from the integral representation (7.9) and 
the following 


Lemma 7.9. For arbitrary channel <f>, .v > 0, and an operator A 

Tr A*(L T +sR s r'A > Tr <&[X*](L* m + (7.14) 

Proof Note that positive maps respect Hermitian conjugation, so that 4>[T*] = 
d>[A r ]*, and similarly for 4>*. 

Since Lj, Rs are positive operators in L 2 {M), the operator Ff + sRs is also 
positive. Set A" = {Lj + s/?s) _1/,2 ri — {Ft + s7?s) 1/,2< I>*[.B], where B = (L ( j,[ 7 ’] + 
In this case, Tr X*X > 0, so that 

TrA*(F r FsRsT'A -Trri*<D*[fi] 

- Tr + Tr^*[B*](L T + sfl 5 )<D*[fl] > 0. 


(7.15) 
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We have 

+ TVd>*[fl*],4 = 21V 4>[A*] (L #[r] + 

therefore, in order to prove the lemma, it is sufficient to show that the last term 
in (7.15) is less than or equal to the right-hand side of (7.14). Thus, 

Tr <D*[B*](L r + = IV + <D*[B*]Td>*[B]] 

= Tr[sQ*[B*]Q*[B]S + 0*[B]<I>*[B*]7’] 

< Tr[s<f>*[B*B]S + 

where the inequality follows from the positivity of S,T and the Kadison inequal¬ 
ity (6.5), which in the case of a channel <£ takes the form 

Then by using the relation TV &*[B*B]S = TV B*B<&[S], we find 

Tr $>*[B*](L T + sRs)<S>*[B] < Tr(sB*B$>[S] + BB*$>[T ]) 

= TrB*(sB$>[S] + <D[T]B) 

= TVB*(L$ [r] + J/?* [S] )(5) = TV 
= TV $[X*](L* m -Fs/^s]) -1 Q[A]. □ 

This proves Theorem 7.7 in the case of nondegenerate S, T. In the general case, 
either H(S ; T) = +oo and the inequality (7.5) is trivial, or supp S c supp T, in 
which case we can assume that supp T = M, i.e. the operator T is nondegenerate 
and for H(S\ T ) the formula (7.1) holds. Approximating an arbitrary operator S by 
nondegenerate operators and using the continuity of the entropy (see Section 7.4), 
proves the statement of the theorem in the general case. □ 

We now consider a number of useful corollaries of the Monotonicity Theorem 7.6. 

Corollary 7.10 (Generalized H-theorem). Let & be a bistochastic (unital) channel in 
M — = M 2 - Then, for any state S, 

H(<i>[S]) > H(S). (7.16) 


Proof. From (7.1) it follows that 

H(S) = log d -H(S; S), (7.17) 

where S = jl is the chaotic state, hence 

tf(<D[S]) = log d- H($>[S]; <J>[5]) 


because 4>[5] = S. The corollary now follows by virtue of the Monotonicity Theo¬ 
rem 7.6. □ 
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Corollary 7.11. The quantity (5.8) is monotone under the action of the arbitrary 
channel <f>: 

x ({**}; *[{s x }]) < x {s*}) • (7.18) 

Proof. By introducing the average state S n — it x S x , we obtain the important 
identity generalizing (4.21) 

X(foc};{S*}) = H(S n )-Y,"xH(S x ) = £>* #($*;$*). (7.19) 

Therefore (7.18) follows directly from (7.5). □ 

In particular, Theorem 7.6 implies the bound (5.16) on the Shannon information. 
Take an arbitrary basis {e y } in 3t and consider the q-c channel 

ip[S] = Y^TrSMyley)^], 
y 

corresponding to the measurement of the observable M = {M y }, then 

^[S*] = ^PM{y \x)\e y ){e y \\ 
y 

^[s*] = X \ Yi n xPM(y\x) 

y \ x 

where PM(y\x ) = Tr S x M y , are the density operators that are diagonal in the basis 
{e y }. Therefore, the quantum relative entropy at the output of the channel 4' becomes 
the classical relative entropy, and relation (4.21) implies 

X^7/(4'[S*];4'[S 3r ]) = $\{n, M). 

X 

By using the relation (7.18) for the channel 4* we obtain the required inequality. 

A useful quantity, motivated by Corollary 7.10, is the entropy gain 

G*[S] = H($>[S]) - H(S). (7.20) 

Corollary 7.12. The entropy gain is convex as a function of S and is superadditive: 

G-jqigxi^Sn] > G^JSi] + G$ 2 [5 2 ]. (7.21) 

Proof Taking into account the first identity in (7.19), the convexity of G$[5] is just a 
reformulation of relation (7.18). The superadditivity (7.21) is equivalent to the relation 

//((4>i <gi <J> 2 )[Si 2 ]; (4>i <S> $ 2 )[Si <g> S 2 ]) < H(Si 2 ; Si <g> S 2 ) 



which follows from the Monotonicity Theorem 7.6. 


□ 
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7.3 Strong subadditivity of the quantum entropy 

Theorem 7.13. The following properties are equivalent: 

i. monotonicity of the quantum relative entropy 

ii. strong subadditivity of the quantum entropy: Let S'] 23 be a density operator in 
M\ <g) X 2 <S> =# 3 . In this case, with the obvious notations for the partial states, 

H(S 123 ) + H{S 2 ) < H(S 12 ) + H(S 23 ). (7.22) 

iii. joint convexity of the quantum relative entropy H(S; T ) with respect to the argu¬ 
ments S, T 

Proof 

i. =>■ ii. Denoting by S a = (dim X a )~ l I a the chaotic state in 3t a , we have 

H(S l23 ; S 123 ) = H(S 123 , ® S 3 ) + H(S l2 ; S 12 ), 

H(S 23 ; S 23 ) = H(S 23 ; S 2 <g) S 3 ) + H(S 2 ; S 2 ). 

Performing subtraction and using (7.17), we get 

H(S 12 ) + H(S 23 ) - H(S l23 ) - H(S 2 ) = H(S l23 , S 12 ® S 3 ) - H(S 23 ; S 2 ® S 3 ). 

Consider the partial trace channel 4 >[Si 2 3 ] = S 23 . Applying Theorem 7.6 to it, we 
deduce the nonnegativity of the right hand side, i.e. the strong subadditivity. 

ii. =>■ iii. Consider the density operator in M\ 0 M 2 ® M 3 of the form 

Sl23 = \ e k)( e k\ ® A kl ® \hl){hl\, 
kl 

where { e is an orthonormal basis in 3t\, {hi } is an orthonormal basis in 3t 3 , and 
Am are positive operators in M 2 , which become density operators with an appropriate 
normalization. In this case, 

Sv2 = 'Y2t\ e k){ e k\®'Yh A kh S 23 = Y2 A kl ® \hl){hl\, S 2 = YA kh 

k l kl kl 

and the strong subadditivity implies 

- e H(A kl ) + e " (e + E " (E - « (e ■ 4 «) i o- 

kl k \ l / l \ k ) \kl / 
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where we denoted H(A ) = —TrA log A for a positive operator A. In particular, for 
arbitrary positive operators A\, A 2 , B\, B 2 , 

H(Ai + A2 + Bi + B2) 

- H(A l + A 2 ) - H(B 1 + B 2 ) < H(A l + B x ) - H(A{) - H{Bf, 

+ H(A 2 + b 2 ) - //(^ 2 ) - H(B 2 ), 

which is equivalent to the joint convexity of the function 

A(74, B ) = #04 + fi) - //(A) - //(fi) 

with respect to A, B. On the other hand, the following lemma implies that H(A; B) 
is convex as a supremum of a family of convex functions. 

Lemma 7.14. For any two density operators A, B 

H(A ; B) = sup A -1 xa 04; B ), (7.23) 

A>0 


where 


Xa04; B) = H(XA + (1 - A )B) - A H(A) - (1 - A )H(B) 

= A(XA, (1 - X)B) + h(X). 

Proof. From the proof above, it follows that the function xa(^ 4; fi) is convex with 
respect to /l, fi. Asa function of A e [0,1] it is concave and equal to zero for A = 0,1 
(this was proved in Section 5.3). Hence, for fixed A, B 

supA -1 xa04 \B) = lim X~ l xx( A \B) = ^prXxiMB) . ■ 

i >n A-^+o d X A=o 


Exercise 7.15. Let F be a continuously differentiable function on (0, + 00 ). 
Show that for any two positive operators A, B the following holds: 

-yyTr F(XA + (1 - A )B) = Tr F'(XA + (1 - A )B)(A - B). (7.24) 

Hint: prove (7.24) for the case where F(x) = x k , implying the result for poly¬ 
nomials, and then use the Weierstrass uniform approximation theorem. 

Now let supp A c supp B, so that H(A\ B) < + 00 . Without loss of generality 
we assume that the operator B is nondegenerate. Computing the derivative of 


H(XA + (1 - A )B) = Tr i/(A A + (1 - A )B) 
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by formula (7.24) we obtain 


Tx U(A-,B) 


— Tr A (log A — log B) + (loge)Tr (B — A) = H(A; B ), 


from which (7.23) follows. □ 

l Exercise 7.16. Prove the statement (7.23) for the case H(A; B) — +oo. 

iii. =A i. Consider the density operator S\ 2 in 3t\ (g> 3t 2 and let Si = TV 2 S 12 . From 
the property (6.53) of the discrete Weyl operators, we obtain 


1 

Si (g) S 2 = -~2 ^2 ® W a p)Si2{Ii <8> W a p)*. 

a,p=o 

The joint convexity of the relative entropy and the properties from Exercise 7.5 then 
imply 


H(SuS()< H(S l2 \S; 2 ), 

which is monotonicity with respect to the partial trace channel. But, according to 
Theorem 6.9, every channel can be represented as a concatenation of an isometric 
embedding (preserving the relative entropy) and a partial trace. This proves i. □ 

By using (7.17) and the convexity of the relative entropy we also obtain 

Corollary 7.17. The quantum entropy is concave: for arbitrary states Sj and proba¬ 
bility distribution {pj}, 


j j 

Exercise 7.18. Find a direct proof of the concavity, using the spectral decompo¬ 
sition of the operator £/ Pj Sj ar| d the concavity of the function q(t). 


7.4 Continuity properties 

In the finite-dimensional case that we consider here, the quantum entropy is a con¬ 
tinuous function of a state. Indeed, the function rj(x) = —x log x is (uniformly) 
continuous on the segment [0,1], Hence, S n —> S implies \\r](S n ) — ??(S)|| —» 0, 
where ||-|| is the operator norm. Therefore, 

I H(S n ) - H(S)\ = |Tr (n(S„) - i/(S))| < d \\n(S„) - i/(S)|| ^ 0. 

A more precise estimate, which we will need later, is given by the following 
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Lemma 7.19 (Fannes [58]). Let Si,S 2 be two density operators in d-dimensional 
Hilbert space, such that ||Si — S 2 || i < \. In this case, 

\H (50 -H (S 2 )| < logj • ||- 52 IK + i/dlSr - S^). (7.25) 

Since the function rj (x) is monotonously increasing for x e [0, £], the lemma 
implies the estimate 

I H (Si) - H (S 2 )| < \ogd ■ ||Si - Sail! + ^ . (7.26) 

e 

Proof. Let Ai > A 2 > ••• (/x 1 > /x 2 > • • •) be the eigenvalues of Si (correspond¬ 
ingly, S 2 ). 

Lemma 7.20. Denoting A, = |A,- — /x; |, one has 

d 

A = ^A,- <||Si-S 2 ||i. (7.27) 

i' = l 

Proof. Let iq > u 2 > • • • be the eigenvalues of the operator 

A = Si + [Si- S 2 ]_ = S 2 + [Si - S 2 ]+ . 

Since A > Si, S 2 , we have v ; > A,, /x ; -, so that 

| A; — /X; | < 2f; — A; — /X; . 


Therefore, 

d 

A < J^(2vj - A,- - m) = Tr(2A-Si- S 2 ) = Tr [Si — S 2 ]+ + TV [Si - S 2 ]_ 

i = l 

= 1151 - 5211 !. □ 

Exercise 7.21. Prove the inequality 

\h(y) - 7 (*)l < *i(y -x ), 0 < x < y < 1. 

Combining these statements, we obtain 

d d 

\H (Si) - H (S 2 )| < 17 (A ; ) - h (pti) | < £ »? (A,-) 

i=1 i = 1 

d 

— A ^17 (A;/A) + 17(A) < AlogJ + 17(A). 

; = i 


Taking into account the monotonicity of 17 (x), and combining this with (7.27) pro¬ 
duces (7.25). □ 
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Exercise 7.22. Show that the relative entropy is lower semicontinuous in the 
following sense: if S n —> S and S' —» S', then 

lim inf H{S n \ S') > H(S; S'). 

n->oo 

Hint: use the continuity of the entropy, representation (7.23), and the fact that the 
supremum of a family of continuous functions is lower semicontinuous. 

As we shall see later, in Chapter 11, this property survives for both the entropy 
and for the relative entropy in the infinite-dimensional Hilbert space, while the global 
continuity of the entropy no longer holds. 


7.5 Information correlation, entanglement of formation 
and conditional entropy 

In the classical case, the amount of information transmitted by the channel X ->• Y 
is measured by the quantity I(X;Y) = H(X) + H(Y) — H(XY). In quantum 
statistics there is no general analog of the quantity H(XY), as the joint distribution 
of observables exists only in special cases. One such case is that of the composite 
system. 

Let S 12 be a state of a bipartite quantum system in the Hilbert space 3t\ <8 3t 2 . In 
this situation, an obvious quantum analogue of the Shannon mutual information is 

7(1; 2 ) = //(Si) + H(S 2 ) - H(S l2 ) = H(S l2 ; Si (8 S 2 ). (7.28) 

We call this information correlation. From Corollary 7.4, it follows that /(1;2) > 0 
and /(1; 2) = 0 if and only if Si 2 = Si <8 S 2 . 

Take a bipartite system and perform the measurements of observables M\, M 2 
in the corresponding subsystems. Their joint distribution exists and is given by the 
formula p(x, y) — Tr S\ 2 (M\ X <8 M 2y ). Let I(M \; M 2 ) denote the Shannon mutual 
information between the outcomes of these measurements. 

Exercise 7.23. Show that, for a pure state, Si 2 = \ 

7(1; 2) = 2 max I(M U M 2 ). (7.29) 

Mi ,M 2 


In the case of the pure state Si 2 we have H(S i 2 ) = 0, and Theorem 3.10 implies 
that 

//(Si) == H(S 2 ) (7.30) 


and 


7(1; 2) = //(Si) + H(S 2 ) = 2//(Si). 


(7.31) 
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The quantity H(S i) = HiSj) then is a natural measure of entanglement of a pure 
bipartite state S\ 2 - It is equal to zero for unentangled states and takes the maximal 
value log d for the state (3.7), which explains the name “maximally entangled state”. 

The question of measures of entanglement for mixed states is substantially more 
complicated than that for pure states. Unlike the case of pure states there are many 
different measures of entanglement for mixed states. Of main interest for us will be 
entanglement of formation, which is defined for an arbitrary state S ’12 in M\ ® M 2 as 

E F (S 12 ) = mfY / XjH(Tr 2 s{ 2 ), (7.32) 

j 

where the infimum is taken over all possible decompositions of the density operator 

S 12 = J2 n J S i 2 

j 

as convex combinations of some density operators S J l2 (in terms of convex analysis, 
E F (S 12) is the convex hull of the continuous function Si 2 —» H(S 1 ). 

Proposition 7.24. The function S 12 —» Ep (Si 2 ) is continuous, convex, and the in¬ 
fimum in (7.32) is attained on a convex combination of no more than (didf) 2 pure 
states Sj 2 . Moreover, the duality relation holds: 

E F (S 12 ) = maxTr Si 2 ^ 4 i 2 . ( 7 . 33 ) 

^12 

where the maximization is over the Hermitian operators A 12 in Mi (g> M 2 , satisfying 

TrT l2 A l2 < H(Tt 2 T 12 ), (7.34) 

for all (pure) states T\ 2 in Mi < 8 > M 2 - 

Proof Let M = M\ ® M 2 , S — S\ 2 and f(S) = H(Tr 2 S 12 ). In this case, 
/ is a continuous concave function on the compact convex set of quantum states 
<5 = <5(M). The proof below uses very few of the other special properties of / and 
<5, and could be entirely based on references to the convex duality theory. However, 
we here provide a self-contained proof. 

A finite collection of states with the corresponding probabilities is usually called 
an ensemble 1 . For our proof it is useful to consider “continual” generalization of en¬ 
sembles, described by the probability measures n(dS) on the set <B(M) (the concept 
of a generalized ensemble will be systematically explored in Ch. 11). Now, the usual 
ensembles correspond to finitely supported measures. By using general facts from 
measure theory (see e.g. [164]) one can show that the set P(<B(M)) of all probability 

1 This notion is different from the “statistical ensemble” in Section 2.1.2, which means a statistical 
sample of identically prepared systems. 
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measures on ©(JC) is itself compact with respect to the topology of weak conver¬ 
gence, which is defined as the convergence of integrals of all bounded continuous 
functions on <S(Jf). 

Consider the functional 


F(n) = / f(T)n(dT), (7.35) 

J<5 

on P(<5(M)), which for finitely supported n coincides with the minimized expression 
in (7.32). From the definition, it is a continuous, affine functional. We are interested 
in the minimization of this functional, on the closed convex subset Ps of probability 
measures n with the given barycenter 

S = [ Tn(dT). (7.36) 

The continuous functional F(n) attains its minimum on the compact set Ps- The 
concavity of / then implies that we can choose the minimizing measure n to be sup¬ 
ported by the pure states, since we can always perform decompositions of all density 
operators T = T \2 into pure states, without changing the barycenter and without 
increasing the value F(n). 

Let us now show that such an extreme n is supported by no more than (d\d 2 ) 2 
states. Note that the states supporting n are linearly independent in the sense that if 

/ c(T)Tn(dT) = 0 (7.37) 

for some real bounded measurable function c(T), then c(T) = 0 almost everywhere, 
with respect to n. Indeed, it follows from (7.37) that n = |rr+ + |rr_, where 

n±(dT) = (l±ec(T))n(dT) 

for sufficiently small e, so that extremality implies n±(dT) — n(dT), and hence 
c(T ) = 0 (mod n), i.e. almost everywhere with respect to the measure n. By taking 
the matrix elements of (7.37), we see that the real Hilbert space L^(rr) is spanned by 
(dic/ 2) 2 functions 


T ^m{ej\T\e k )-, 1 < j < k < d x d 2 , 

T —>• %{ej\T\ek)', 1 < j < k < d\d 2 . 

Indeed, any function c(T) that is orthogonal to all these functions in L^(rr), must be 
zero (mod n). The rest follows from the following exercise. 
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I Exercise 7.25. Show that if dimL^(rr) = n, then n is finitely supported by n 
points. 

The inequality > in the duality relation follows from integrating the inequality 
/( T ) > Tr77l with respect to an arbitrary probability measure n that satisfies the 
constraint (7.36). In the proof of the reverse inequality we can consider only those 
measures which have a fixed (but arbitrary) finite support, containing the support of 
the optimal (minimizing) measure n°. In this case, the minimization in (7.32) be¬ 
comes a finite dimensional problem with a finite number of linear constraints, defined 
by (7.36). Note that these constraints also imply the constraint expressed by the nor¬ 
malization of a probability measure. Applying the elementary Lagrange method, we 
then obtain that there exists a Hermitian operator A such that 7r° minimizes the func¬ 
tional 

F(n)-Tr^J Tn(dT)j A = J [f(T) — Tr TA]n(dT) (7.38) 

over all (non-normalized) positive measures n with the given finite support. Here, the 
Hermitian operator A simply describes the collection of real Lagrange multipliers for 
the constraints (7.36). It follows that 

f(T) - Tr TA > 0 (7.39) 

f(T) - Tr TA = 0( mod n°). (7.40) 

The inequality (7.39) follows from the fact that otherwise the infimum of the func¬ 
tional (7.38) would be —oo, while the equality (7.40) guarantees that the infimum, 
equal to 0, is attained. By integrating the second equality with respect to n°, and 
taking into account (7.36), we obtain Tr S A = f(T)n° ( dS ) which, together with 
inequality (7.39), implies that A is the solution of the dual problem, and (7.33) holds 
with the common value Tr SA. 

The duality relation implies that the function Ep(S) is lower semicontinuous, as 
the maximum of the family of continuous (affine) functions S —> Tr SA. It now 
suffices to show that it is upper semicontinuous, i.e. 

limsup Ep (S ( ”^) < E f (S), 

n —>oo 

for any sequence of states S^ —> S. In this proof we follow Shirokov [185], Let 

N 

Ep(S) = 

7 = 1 

where S = n j Sj ■ Now, if 5 is nondegenerate, put 

7 = 1 


(7.41) 
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where 

x™ = tVxf, 

Sj n) = (l]" ) )“ 1 (5 ( " ) ) 1/2 5“ 1/2 5 y -5~ 1/2 (5 ( " ) ) 1/2 , 
and 

tj n) = Tr(5 ( " ) ) 1/2 5“ 1/2 5 y -5 _1/2 (5 ( " ) ) 1/2 . 

These expressions also make sense for degenerate S, if by S~ x ! 2 we denote the gen¬ 
eralized inverse of S’ 1 / 2 , i.e. the operator that is equal to the inverse on the support of 
S and zero on its orthogonal complement (see [185] for detail). Now, 

E F (S^) < J2 n j n) H( 'IWj' 0 ), 

;=i 

and the right-hand side tends to (7.41). Therefore 

limsup £ f (S ( ">) < E f (S), 

n—>oo 

that is Ep is upper semicontinuous, and hence continuous. □ 

Let us come back to the case of the pure state S\ 2 - Note that in classical statis¬ 
tics, partial states of a pure state (marginal distributions of a degenerate distribution) 
are again pure, and purification has no classical counterpart. Related to this fact is 
the following unusual property of the quantum analog of conditional entropy. In the 
classical case, the conditional entropy is always nonnegative 

H(X | Y) = H(X Y) - H(Y) = £ p y H(X \ Y = y) > 0. (7.42) 

y 


Defining the quantum conditional entropy as 

H( 1|2) = H(S 12 )-H(S 2 ), 

we see that it is negative if .Si 2 is a pure entangled state. In other words, unlike the 
classical entropy, the quantum entropy is not monotone with respect to enlargement 
of the system: El(S 1 ) ^ H(S 12 ). Nevertheless, similar to the classical case, the 
following properties hold 

i. Monotonicity of the conditional entropy: H( 1123) < H (112); 

ii. H{\\2) + 77(113) > 0. 
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I Exercise 7.26. Show that these properties are reformulations of the strong sub¬ 
additivity of the quantum entropy (7.22). Hint: use purification of the total state. 

Another useful property of the quantum conditional entropy is: 

Corollary 7.27. The conditional entropy H( 1|2) is a concave function of the state 
S\ 2 - It is subadditive in the following sense, 

//(13|24) < H(\ 12) + H( 3|4). (7.43) 

The statements follow from Corollary 7.12 by observing that the conditional en¬ 
tropy is equal to minus the entropy gain that corresponds to the partial trace channel: 
H(l\2) = -G TVl (S 12 ). 

These properties make the conditional entropy a useful tool in quantum information 
theory. 


7.6 Entropy exchange 

Let 0 be a channel from the input space Ma to the output space Mb and let S = Sa 
be a density operator in Ma denoting the input state. By Theorem 6.9, the channel 
admits the Stinespring representation 

$[S A ] =Tt e VS a V*, (7.44) 

where V is an isometric operator from Ma to Mbe = ® Me and Me can be 

thought of as the “environment” of the channel. Transmission of the state Sa produces 
the state of the environment 


S E =Tr B VS A V* = $[S A ], (7.45) 

where <f> is the complementary channel from the input to the environment. In what 
follows we will simplify notations for the entropies by omitting the notation for the 
state, e.g. replace H(Sa ) by H(A ) etc., as we already did in the previous section. 

Definition 7.28. The quantity H(Se) = /f(d>[5U]), which is equal to the output en¬ 
tropy of the environment, is called the entropy exchange and will be denoted H(S, <b), 
or simply H(E). 


By using the construction of the complementary channel from (6.42), we see that 
Se is given by the density matrix [Tr S V* 14] y - k=TJf' Hence, we obtain the follow¬ 
ing result: 


Proposition 7.29. Let <D[S] = VjSVf, with V/Vj = I. Then 


H(S, <D) = H 


( 


TrSV*V k 



(7.46) 
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Figure 7.1. Purification with the reference system. 


To obtain another useful expression for H(S, <t>), we take a reference system Mr. 
Then we can purify the state S A to War)War\ in Ma ® Mr. Next, compute 

H ((d> <s> ldR)\f A R){fAR\) = H(Sbr). (7.47) 

Now, the tripartite system BRE described by the Hilbert space Mere — Mb® Mr ® 
Mr is in the pure state 


I fBR.) <8> We) = (V <8> Ir)War)- (7.48) 

Looking at the split BR \ E we have a bipartite system in the pure state and therefore, 
by (7.30) 

H(S e ) = H (S B r). (7.49) 

Formula (7.49) provides an alternative expression for the entropy exchange as the 
output entropy of the extended channel <f> ® I d r applied to the purified input state 
Sar. Since the left-hand side does not involve R, this also shows that H(S B r) does 
not depend on a particular way of purification of the state S = Sa- 

The entropy exchange satisfies a simple but useful inequality: Let S = JT pj Sj 
be a mixture of pure states Sj. In this case, 

H{S,^)>Y,PjH{^[Sj]). (7.50) 

j 

Indeed, by concavity of the quantum entropy. 


H(S, <b) = H 


<D 


J2pj' s j 


j 


But H (0[5y]) = H (<£[5/]), because the system BE is in a pure state as soon as the 
input Sj is pure. 
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Example 7.30. Consider the depolarizing channel (6.49) and the input chaotic state 
S = Id/d. By using representation (6.54) with the Kraus operators V a p and the 
properties of discrete Weyl operators, we find 


^SV^Va'p' = 8(a0),(a'0') 


_p_ 

d 2 ’ 


1 -P 


d 2 -1 
d 2 ’ 


m * (oo), 
m = (oo). 


Hence, 


H(S,Q) = - 


(> 





P_ 

d 2 ' 


(7.51) 


7.7 Quantum mutual information 

So far, we have encountered three entropic quantities: the input entropy H(S ), the 
output entropy Z/(d>[5]) and the entropy exchange H(S, d>). The relation between 
them allows us to interpret them as the sides of a triangle: the sum of any two of them 
is larger than the third. For example, the difference in length of the upper two sides of 
the triangle and its basis can be seen as the information correlation between R and B: 

I(S, <F) = H(S ) + Z/(d>[5]) - H(S , <F) (7.52) 

= H(A) + H(B) - H(E ) 

= H(R) + H(B ) - H(BR ) > 0 

by the subadditivity of the quantum entropy. 

Since R has the same entropy as A, this quantity can be thought of as the quantum 
mutual information between the input and output. The mutual information I(S, 4>) 
has a number of nice properties, similar to that of the Shannon information. 

Proposition 7.31. The quantum mutual information /(S, 0) is 

i. concave in S; 

ii. convex in 4>; 

iii. subadditive: /(S 12 , 4>i 0 $ 2 ) 5 I(S\, Oj) + /(S 2 , d> 2 ),' 

iv. satisfies the data processing inequalities: 

1(S, d >2 o <J> X ) < min{/(S, d>i), 7(3>i(S), d> 2 )}- 


Proof The proof is based on playing with different splits of the tripartite system 
BRE, which is in the pure state (7.48), and on the properties of the conditional 
entropy. 
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i. 7(5,0) = 77(5) + 77(5) -77(5) 

= H(BE ) + H(B) - 77(5) 

= 77(5|5) + 77(5), (7.53) 

where the first term is concave in Sbe by Corollary 7.27, and the second is concave 
in Sb- It remains for us to observe that the maps 5 —>■ Sbe and 5 —>■ Sb are affine. 

ii. 7(5, O) = 77(7?) + 77(5) - 77(5) 

= 77(7?) + 77(5) - 77(55) 

= 77(5)-77(515), 

where the first term does not depend on O, while the second is concave in Sbr = 
(O 0 Id/?)[5,4fl], which is affine in O. 

iii. This follows from the expression 7(5, O) = 77(51 E) + 77(5), see (7.53), the 
subadditivity (7.43) of the conditional entropy 77(51 E) and the subadditivity (7.4) of 
the quantum entropy 77(5). 

iv. Denote by A the input of Oj, by 5 the common output of 4>i and input of <t> 2 , 
and by C the output of 4>Let also 5 1; 2 denote the environments for d>i, 2 - We have 

7(5, d>2 o Oj) = 77(5) + 77(C) - H(E X E 2 ). (7.54) 

To prove the first data processing inequality, note that 

7(5, d>i) = 77(5) + 77(5) — H(Ei), (7.55) 

so we have to show that 77(C) — H(EiE 2 ) < H(B ) — H(Ei). But since the systems 
CRE\Ei and BRE\ are in pure states, this is equivalent to 

77(5|5!5 2 ) < 77(5|7si), 

which is true because of the monotonicity of the conditional entropy. 



Figure 7.2. Composition of channels. 
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To prove the second data processing inequality, note that 

/(<*>! [S], d> 2 ) = H(B) + 77(C) - H(E 2 ) (7.56) 

so we have to show that H(R ) — H{E\E 2 ) < H(B ) — H(E 2 ). But since the sys¬ 
tems BRE\ and CRE\E 2 are in pure states, this is the same as H(R) — H(CR ) < 
H(REi) - H{CRE\) or 


H(C\RE 1 ) < H(C\R). 

However, the last inequality holds due to the monotonicity of the conditional 
entropy. □ 

7.8 Notes and references 

1. The entropy of a density operator was introduced by von Neumann [212]. 
A detailed survey of the properties of the quantum entropy and the quantum rela¬ 
tive entropy can be found in the article of Wehrl [215] and in the books of Ohya and 
Petz [161] and Petz [168], 

The inequality (7.3) can be strengthened by replacing Tr (S — T) 2 by ||S — T\\ 2 , 
see [180], The proof is based on the corresponding classical Pinsker’s inequality, see 
Lemma 12.6.1 in [42], 

2. A real function g, defined on the interval 7 c M, is called operator-monotone if, 
for arbitrary natural n and Hermitian n x n -matrices A < B with spectra in 7, the 
inequality g(A) < g(B) holds. Operator-convex functions are defined in a similar 
fashion. The main results concerning operator-monotone and operator-convex func¬ 
tions, as well as a number of useful matrix inequalities can be found in the book of 
Bhatia [27], By using the classical results from the theory of operator-monotone and 
operator-convex functions one can show that the class ~§, defined by the relation (7.7), 
consists of operator-convex functions g on (0, oo) such that g(\) = 0. 

There are several proofs of the Monotonicity Theorem 7.6. A usual approach is to 
derive it as a corollary of the celebrated “Lieb concavity” [161], [158], In a series of 
papers [146], [147], and [148], Lindblad obtained a number of approximations to the 
monotonicity property and finally proved its equivalence to the strong subadditivity 
of the quantum entropy, established earlier by Lieb and Ruskai [145]. Uhlmann [209] 
gave a different proof based on an interpolation method. Quantum “quasi-entropies”, 
including the relative g-entropies, were introduced by Petz [167], who gave a proof 
of Theorem 7.7 for maps satisfying (6.5), based on an operator generalization of the 
Jensen inequality. Here, we follow the paper of Lesniewski and Ruskai [143], where 
a direct proof of the monotonicity was given that makes no use of the Lieb concavity. 
This proof is based, after all, on a generalization of the Cauchy-Schwarz inequality 
and allows us to establish the monotonicity of a whole class of invariants of pairs of 
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states in quantum “geometro-statistics”, initiated by Morozova and Chentzov [39]. 
Effros pointed to the connection of these problems with the notion of the matrix per¬ 
spective of an operator-convex function [56], The condition for the equality in the 
monotonicity property was established by Petz (see also the survey of Ruskai [173]). 

A new, “natural” proof of the monotonicity property of the relative entropy was sug¬ 
gested by Bjelakovic and Siegmund-Schultze [28] (see also Hayashi [78]), based on 
the operational interpretation of the quantum relative entropy as the optimal error ex¬ 
ponent in the problem of asymptotic discrimination between two quantum states (for 
the quantum analog of Stein’s Lemma in mathematical statistics, see Theorem 12.8.1 
in [42]). 

The entropy gain was introduced and studied by Alicki [6]. 

3. In this section we follow Lindblad [148], see also the book of Ohya and Petz [161]. 

4. Lemma 7.19 is due to Fannes [58], Audenaert [12] obtained the sharp bound 

I H (St) - H (S 2 )| < log(rf - 1) • ||Si - S 2 ||, + h 2 (||Si - S 2 IIO • 

The connection of this inequality with Fano’s Lemma 4.14 is indicated in the book of 
Petz [168] . 

5. The information correlation was considered by Stratonovich [203] and by Lind¬ 
blad [146]. The entanglement of formation was introduced in the fundamental work 
of Bennett, DiVincenzo, Smolin and Wootters [21]. For mixed states, there is a whole 
variety of measures of entanglement. The quantitative theory of entanglement is an¬ 
other separate chapter of quantum information science, in which the mathematical 
methods are substantially applied and developed, see e.g. the surveys of Keyl [128], 
Alber et al. [5], and the book of Hayashi [78]. 

The definition and properties of the convex hull, as well as the general duality theo¬ 
rem of convex programming are presented in, e.g. the books of Rockafellar [172], and 
Magaril-Il’yaev and Tikhomirov [154], The proof of upper semicontinuity of the en¬ 
tanglement of formation is based on a construction from the work of Shirokov [185]. 

The quantum conditional entropy was introduced by Adami and Cerf [2]. The oper¬ 
ational interpretation of the negative values of the conditional entropy as a “credit” 
for future quantum communication based on state merging protocol was proposed by 
M. Horodecki, Oppenheim, and Winter [120]. 

6. Entropy exchange was introduced (without using this name) by Lindblad [149] 
and later, independently, in the context of quantum information theory, by Bamum, 
Nielsen and Schumacher [16]. 

7. The quantum mutual information also first appeared in the paper of Lindblad [149] 
and later, independently, in the paper of Adami and Cerf [2], who studied its properties 
in detail in an information-theoretic context. 
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Basic channel capacities 




Chapter 8 

The classical capacity of quantum channel 


8.1 The coding theorem 

In Chapter 5, the basic coding theorem for a classical-quantum communication chan¬ 
nel was established. In this chapter, we will use this result to obtain the coding theo¬ 
rem for an arbitrary quantum channel. 

Consider the channel 0 : T:(Xa) —»■ and the corresponding composite, 

“memoryless” channel 0®" = 0 0 ••• <g) 0 : (M® n ) T:(Mg n ). The block 

code for such a composite channel comprises a c-q channel i —>■ sj n ^ encoding clas¬ 
sical messages i into input states sj n ^ in the space M® n , and a q-c channel (observ¬ 
able M (")) in the space Mg' 1 , decoding the output states 0®”[s| n) ] into classical 
messages j: 

i -► S t (n> -> <b® n [s\ n) ]-^j 
This leads to the following definition 

Definition 8.1. A code of length n and of size N for the compos¬ 

ite channel 0®“ consists of an encoding, given by a collection of states E^ = 
{S ; (n) ; i = 1,.. ., N} in 3tf n and a decoding, described by an observable = 
=0,1,..., N} in M® n . 

The maximal error probability of the code is equal to 

P e (^ n \M^)= max [1 - psmOIO] , (8.1) 

i = l,...,N 

where 

PXM(j |i) - Tr 0®"[5 I (n) ]M/ ) (8.2) 

is the probability to make a decision j under the condition that a message i was sent. 
The minimum of the error P e ( E^"\ Af("0 over all codes of length n and size N is 
again denoted p e (n,N). The classical capacity C(0) of the quantum channel 0 is 
defined as the least upper bound of the rates R for which 

lim p e (n, 2 nR ) = 0. 

n->-oo 
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Let us recall that a finite probability distribution n on the set of quantum states 
©(Jf), that assigns the probabilities rr,- to states Si, is called an ensemble. If an 
ensemble n^ n \ with probabilities {n^} of the input states is given, then us¬ 
ing the transition probability p^Mij j*), we can find the joint distribution of input i 
and output j and compute the Shannon information , M^) according to for¬ 

mula (5.14). Applying Shannon’s Coding Theorem, we obtain, as in Proposition 5.16, 

C(4>) = lim - sup 4(7r (n) ,M (n) ). (8.3) 

n ~*°° n 

In contrast to the case of c-q channel, there is no fixed alphabet here, and one has to 
optimize not only with respect to the output observable M ^ and the input distribution 
{tv^}, but also with respect to all possible states sf n ^ at the input of the channel 

Proposition 8.2. The classical capacity of channel <J> is equal to 

C(d>) = lim -Cv(d>®"), (8.4) 

n —*oo r% v 


where 


=supx({rr I };{d>[5,]}), (8.5) 

71 

the quantity x is defined by the relation (5.8) and the supremum is taken over all input 
ensembles jx — {tv ;, S,} in ©(<7f). 

Note that, similar to Proposition 5.16, the limit in the relations (8.3), (8.4) as 
n —>• oo is equal to the supremum over n, due to superadditivity of the corresponding 
sequences. 

Proof From the upper bound of Theorem 5.9 

J n (n (n) ,M (n) ) < C Z (3>® B ). 


The inequality < in (8.4) then follows from (8.3). 


Let us show that 

C(d>) > lim -Cv(4>®") ee C(<F). 
n-+OQ n 

(8.6) 

Take R < C(d>). In this case, we can choose no and an ensemble 
{jvj n °\ s/ n,,) } in J(f no such that 

7r("°) = 



(8.7) 

Consider the c-q channel 4> in J(f no defined by the formula 




(8.8) 
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By the Coding Theorem 5.4 for c-q channels, the capacity of 0 is 

C(0) = max x ({*,•}; {d>®"° [s/" o) ]}) , (8.9) 

where the states s[ n<>> are fixed and the maximum is taken over the probability distri¬ 
butions {7Ti}. By (8.7), this is greater than n^R. Denoting by p e (n , N) the minimal 
error probability for 0, we have 

p e (nn 0 ,2^ R ) < p e (n,2 n(noR) ), (8.10) 

since every code of size N for <f> is also a code of the same size for 0. Thus, having 
chosen R < C (0), we can make the right hand side and hence the left hand side 
of (8.10) tend to zero as n ->• oo. Via the same argument as in Proposition 5.16 we 
can prove that p e (n', 2 nR ) -> 0, when n' -> oo, and hence (8.6) follows. □ 


8.2 The /-capacity 


We shall call the quantity C x (0), defined in (8.5), the /-capacity of the channel 0. 
Thus, 


C*(0) = SU P 


if 0[S, ])-£>, if (0[S,-]) 


( 8 . 11 ) 


where jx = {iti, Si}. It can also be rewritten as 


c x m = 


sup 

S 6©(Jf) 


H (0(S))-^*(S)], 


( 8 . 12 ) 


where the new channel characteristic 


H 9 [S]= inf ^7Tiif(0[Si]) 


(8.13) 


was introduced, which is called the convex hull of the output entropy H (0[S]). Here, 
the infimum is taken over all ensembles n = {rr,-, Si }, with the fixed average state 
ttiSi = S. Note that, by concavity of the quantum entropy, it is sufficient to take 
the infimum with respect to the convex decompositions of S into the pure states 5,-. 


I Exercise 8.3. Prove that for the ideal channel Hu(S) = 0. Hint: Consider the 
spectral decomposition of S. 

The quantity (8.13) is closely related to the entanglement of formation (7.32), 
namely 

H<t,(S) = E f (VSV*), 
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where V is the isometry in the Stinespring representation (6.7) of the channel <1>, so 
that VSV* is the state of the composite system “output-environment”. The proof of 
the following lemma is similar to that of Proposition 7.24, with the dimensionality 
d\d 2 of 0 M 2 replaced with the dimensionality d\ of M\. 

Lemma 8.4. The convex hull of the output entropy //<j> (5) is a continuous convex 
function on (B(M). The infimum in (8.13) is attained on an ensemble consisting of no 
more than d% pure states, d^ = dim M^. 

Corollary 8.5. The supremum in the expression (8.11) for the x-capacity is attained 
on an ensemble consisting of no more than d^ pure states. 

Exercise 8.6. By using Exercise 4.8, show that an ensemble n — {n 5, } with 
the average state S n — J2i 71 i $i ' s optimal for the supremum in expression (8.11) 
if and only if the following maximal distance condition holds: there exists a 
positive p, such that 

Z/(d>[5]; ^[Sjr]) < /x, for all (pure) input states S, (8.14) 

with equality for members S = S, of the ensemble with 7r,- > 0. In this case, 
necessarily, p — C x (<t>). Hint: 

^-x ({mY, {d> [Si]}) = if(3> [Sy]; o [S x ]) - loge. 

The ^-capacity can be computed for a number of interesting channels. 

Exercise 8.7. Show that C* (Id) = log d for the ideal channel in d -dimensional 
Hilbert space. For the quantum erasure channel C X (<S> P ) = (1 — p) log d, where 
p is the probability of erasure. In both cases the optimal ensemble consists of 
equiprobable pure states corresponding to an orthonormal basis in M. 


Proposition 8.8. Let 4> be a covariant channel (see Section 6.7), 


*[U g A S(U g r] = U*9[S](U*)'. 

If the representation U^ is irreducible (in which case 4> is called irreducibly covari¬ 
ant). 


C*(<P) = H (O ) - mm H(d>[S]), 
where the minimum is taken over the set of pure states. 


(8.15) 
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Proof. Assume first that the symmetry group is finite. Irreducibility of the represen¬ 
tation means that it has no nontrivial invariant subspaces, which is equivalent to the 
following: any operator that commutes with all Ug is a multiple of the unit operator. 
It follows, similar to the proof of relation (6.53), that for all states S 


1 

jG| 


E v i s ( u iT 

gzG 


IA 

dA 


(8.16) 


Let us show that 


max H(Q[S]) = H 

Se<Z(X) v ’ 



(8.17) 


This follows from the fact that the function S -» H (<t>[.S’]) is concave and invariant 
with respect to the transformations S -»• UgSJJg*. Therefore, for arbitrary state S, 


H(*[S]) = |G| _1 J2 H (o[f//S(f//)*]) 

geG 


<i/ $ 


icr 1 E u t s ( u i )* 

geG 


The inequality < in (8.15) then follows from (8.16), and it is sufficient to show that 


C z (0) > H 



min H(^[S)). 


(8.18) 


Let us choose a state So that minimizes the output entropy. Since the entropy is 
concave, it attains the minimum on pure states. Then the value in the right hand side 
of (8.15) is attained for the ensemble of states S g = UgSoUg*', g e G, with equal 
probabilities n g = |G| -1 . 

If the group is continuous, a similar argument applies, but the optimizing distribu¬ 
tion will be continuous, namely the uniform distribution on G. One can then use a 
finite approximation and the continuity of the entropy. □ 


Example 8.9. For the depolarizing channel (6.49) 

C z (d>) = log d + - /J^-^log^l - P—f—j + p ~J^~ Xog 2' ( 8 ‘ 19 ) 

which is achieved for an ensemble of equiprobable, orthogonal pure states forming an 
orthonormal basis in Jf. 

Indeed, the channel is irreducibly covariant, hence (8.15) applies. The output en¬ 
tropy is concave. Hence, it achieves the minimum on the set of pure states. For an 
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arbitrary pure input state, the output <&[|^)(^|] has the simple eigenvalue (l — P~f~) 
and d — 1 eigenvalues j. Hence, the output entropy has the same value 

/ d — 1\ ( d — 1\ d — 1 p 

y og [ l - p —)- p ~' os d 

for all pure input states, and (8.19) follows. 

Example 8.10. Let us compute the /-capacity of an arbitrary qubit unital channel <t>. 
Such a channel is irreducibly covariant with respect to the representation (6.65) (see 
Exercise 6.44). Hence, we can again apply Proposition 8.8. 

In computing minsg©^) //(<t>(S)) it is sufficient to consider <t> = A, by taking 
into account the unitary invariance of the quantum entropy. We also remark that by 
Exercise 2.5 the entropy of the qubit state (2.8) is equal to 

H(S(a)) = h 2 • (8.20) 

Now, the Bloch ball is contracted, by the channel A, to the ellipsoid with lengths of 
the axes \X y \ ,y = x,y,z, and the minimal output entropy is attained at the top of the 
longest axis, which corresponds to the state with the eigenvalues 1±ma -*r This 
provides the /-capacity 

^ , , , (l— maxy|A v |\ 

Cjr(fc) = l-h 2 [ -' ( } 

8.3 The additivity problem 

8.3.1 The effect of entanglement in encoding and decoding 

Several interesting and difficult mathematical problems in quantum information the¬ 
ory are related to the proof or disproof of the additivity property: 

C z (0i®0 2 )-C z (<I>i) + C z (0 2 ), (8.22) 

for some channels d>i, d> 2 . Obviously, always 

C z (<Di (8) <D 2 ) > C z (4>0 + C z (<b 2 ). 

If (8.22) is fulfilled for the channel d> = d>j and an arbitrary channel <t> 2 , then 

C Z (<D®") = nC z (<D) (8.23) 

and due to relation (8.4) the classical capacity of the channel <t> is equal to the /- 
capacity, 


cm = c x m. 


(8.24) 
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Property (8.22) was established for some important classes of channels <& = 

(and arbitrary <J> 2 ) such as entanglement-breaking channels (Shor [190]), unital qubit 
channels (King [129]), depolarizing channels (King [130]), and several others. In par¬ 
ticular, equality (8.24) holds for the classical capacity of all these channels. There was 
a strong belief that the additivity should hold for arbitrary channels d>i, <t> 2 , supported 
by the absence of counterexamples and intensive numerical search in low dimensions. 
However, it was finally shown that in very high dimensions there exist channels that 
violate (8.22), (8.23), (8.24). We will discuss these results later, in Section 8.3.5. 

The additivity of the Shannon capacity for classical channels, after all, relies upon 
the absence of entanglement in the classical systems. To see this, let us try to gener¬ 
alize the classical proof based on the Kuhn-Tucker conditions (see Proposition 4.9) 
to the quantum case. Let d>i, <J> 2 be two channels with the optimal ensembles having 
averages S n 1 , S n i . We wish to prove that 

C z (d>i <8> $ 2 ) 5: C;r(3>i) + Cjf(3> 2 )- (8.25) 

Then condition (8.14), applied to the product channel <J>| (g> <J> 2 , requires the following 
inequality 

fl(($i ® ®2)[Si 2 ]; (®i ® Q 2 )[S x i ® S x i]) < C z (<Di) + C Z (<J> 2 ) (8.26) 

to hold for all pure input states S] 2 of the product channel. Moreover, equality must 
hold for the tensor products of the members of the optimal ensembles with positive 
probabilities, which easily follows from the corresponding equalities for the channels 
d>i, <t> 2 , as well the inequality (8.26) for product states. However, proving (8.26) for 
entangled states is no simpler than proving the inequality 

C z (d>i ® <b 2 ) £ C z (*i) + C z (<D 2 ), 

which is the main problem with (8.22), since the opposite inequality follows from the 
definition of C x . 

Additivity of C x (<t>) would have a surprising physical consequence - it would mean 
that using entangled input states does not increase the classical capacity of a quantum 
channel. However, as shown in Section 5.4, using entangled observables at the output 
of any essentially quantum channel (i.e. a channel with noncommuting output states) 
increases the capacity. From this point of view it is surprising that the additivity of 
C z (d>) is a rather widespread phenomenon. Let us discuss entangled inputs in more 
detail. 

The encoding in the Definition 8.1 of a code will be called unen¬ 

tangled if 

s i n) = J2 />(*” ios„ ® • • • ® s *« - < 8 - 27 ) 

x n 

where Sx k is a state in the &-th copy of the space Ma and p(x n |i) is a conditional 
probability describing preliminary processing of the input message. Let us recall that 
a decoding M^ is unentangled if it has the form (5.37). 
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Given a quantum channel 3>, one can define the four classical capacities C|j, Ci )00 , 
Coo, i, Coo.oo, where the first (second) index refers to the inputs (outputs), oo means 
using arbitrary entangled inputs (outputs), and 1 means restriction to unentangled 
inputs (outputs) in Definition 8.1 and correspondingly in the formula (8.3). By defi¬ 
nition, 

C<x>,<x> = C(d>). 


Exercise 8.11. Show, by using (8.27), the data-processing inequality (4.20), and 
the Shannon Coding Theorem 4.13, that 


Cj,i = maxJ?i(rr, M). 

n,M 

Here, rr is an arbitrary ensemble at the input of the channel <t>, and M is an 
arbitrary observable at the output. 

The quantity C ] l is sometimes called the Shannon capacity of the quantum chan¬ 
nel d>. 

Exercise 8.12. By using the Coding Theorem 5.4 for classical-quantum chan¬ 
nels, show that 

Ci,oo — C^(O). 

The relations between these four capacities are shown in the following diagram: 

Coo,l 5: Coo,<x> ( = C) 

II VI (8.28) 

Ci.i < Ci !<X ) (= Cf) 

where < should be understood as “always less than or equal to, and strictly less for 
some channels”. 

Equality 

C u = Coo ; i (8.29) 

shows that using entangled input states with unentangled output measurements does 

not increase the accessible information. It will be proven below. The upper inequality 

< 

Coo,i 7^ C 00,00 follows from this and from the lower inequality, which expresses 
the strict superadditivity with respect to the output measurements demonstrated in 
Section 5.4. 

Proof of the equality (8.29). Let $ be a channel, let jt be a probability distribution, 
assigning probability to a state at the input of the channel, and let M = {Mj } be 
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the observable measured at the output of the channel. Denote by J, j>(rr, M) the Shan¬ 
non information corresponding to the input probability distribution {n k } and transi¬ 
tion probability p(J\k) = Tr<I>[S^]M 7 = TrSfc<I>*[M/]. In order to prove (8.29), it is 
sufficient to establish the inequality 

max M 1 (g> M 2 ) < max (rr 1 , M 1 ) 

, , (8.30) 

+ max </<&,0r , M ), 

In terms of the classical mutual in f ormation 

= I(X;Y), 

where X is the input random variable, taking values k, and Y is the output random 
variable, taking values j . In order to prove (8.30), consider the states S k in the Hilbert 
space Mi ® M 2 of the channel <t>| <g> <t> 2 , with a product observable M 1 <g> M 2 . In 
this case, the conditional probability is 

P(juj 2 \k) = Tr SfcMM/j (8) $> 2 [M 2 2 ]) = P 1 (ji\j 2 ,k)p 2 (j 2 \k), (8.31) 

where 


p l Ui\j2,k) = TrS} 2 ^* l [M J \], 


p 2 (j 2 \k) = TrS 2 <t>*[M 2 2 ), 


and 


S 2 = TnS k , 


*J2jc 


Tr 2 S k (I ®ct>*[M 2 )) 
Tr S k (I ® <P*[M 2 2 ]) ‘ 


Here, Tr, s denotes the partial trace in the 5-th subsystem (5 = 1,2) in M i ® M 2 . 
We then have 


I(X;YiY 2 ) = H{YiY 2 )-H{YiY 2 \X), 

where H(-), H(-\-) are the entropy and conditional entropy, respectively, of the ran¬ 
dom variables describing the common classical input (X) and the outputs (Y \, Y 2 ) of 
the two channels. By the subadditivity of the classical entropy, 

H(YiY 2 ) < H(Yi) + H(Y 2 ). 


On the other hand, 


H(YiY 2 \X) = H(Yi\Y 2 X) + H(Y 2 \X). 


Combining, we get 


I(X; YiY 2 ) < I(XY 2 ; Yi) + I(X; Y 2 ), 
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which, due to (8.31), amounts to 

^<J>i 0 <J> 2 (j ;r > M 1 ( 8 > M 2 ) < Jo, (tr 1 , M 1 ) + S& 2 (ir 2 , M 2 ), 

where n 1 is the probability distribution that assigns a probability XkP 2 (j 2 \k) to the 
state 5 1 ,, and n 2 is the probability distribution that assigns probability to the 
state S£. Taking the maximum over ji and S provides (8.30). □ 

8.3.2 A hierarchy of additivity properties 

An important characteristic of a quantum channel <t>, which already appeared in Sec¬ 
tion 8 . 2 , is the minimal output entropy 

//(<£>) - min tf(<b[Sl). (8.32) 

The corresponding hypothetical additivity property is 

H (<Di ( 8 > <p 2 ) = H (®i) + H (<b 2 ) ■ (8.33) 

By the concavity of the entropy, the minimum in (8.32) is attained on a pure state 
S e extr©(Jf). Therefore, the classical analog of the quantity //(<t>) is additive, 
because any pure state of a composite classical system is a product of the pure states 
of the subsystems. Instead, for a tensor product M\ (g> describing a composite 
quantum system, one has 

extr©(J(i ( 8 ) Jf 2 ) ^ extr©(Jfi) x extr©(Jf 2 ), (8.34) 

due to presence of the entangled states. Hence, there is no obvious reason for the 
additivity (8.33) in the quantum case. 

Clearly, inequality < always holds in (8.33). This implies that (8.33) obviously 
holds for channels with zero minimal output entropy, i.e. those for which there exists 
a pure output state, such as ideal channel. 

Exercise 8.13. Prove the following statement: the additivity properties (8.22) 
and (8.33) are equivalent for a pair of covariant channels satisfying the conditions 
of Proposition 8 . 8 . Hint: use the fact that the tensor product of the irreducible 
representations of two symmetry groups G\, G 2 is an irreducible representation 
of the group G\ x G 2 . 

Exercise 8.14. Consider the minimal entropy gain 

Use the superadditivity property (7.21) to show that it is additive: 

G(4>i < 8 > d> 2 ) = G(4>i) + G(c£> 2 ). 


(8.35) 
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Now consider the convex hull //<*> ( S ) of the output entropy, defined by rela¬ 
tion (8.13). The hypothetical superadditivity property for this quantity is formulated 
in the following way: for arbitrary state S) 2 € @(J(| < 8 > M 2 ) and channels d>i, <J > 2 

#*,®* 2 (Si 2 ) > H^(Si) + H^ 2 (S 2 ), (8.36) 

where Si, S 2 are the partial traces of S\ 2 in M \, M 2 . 

This superadditivity property is maximal in the sense that its validity for a pair of 
channels $ 1 , <t > 2 implies all the other additivity properties for this pair of channels. 

Proposition 8.15. For two given channels <t>|, <t > 2 the superadditivity property (8.36) 
implies both the additivity of the minimal output entropy (8.33) and of the 
/-capacity (8.22). 

Proof Indeed, let S ® 2 be a minimizer for H ((<t>| (g> <J> 2 )[Si 2 ])- In this case, 

H(<t>i ® $ 2 ) - #((<*> 1 ® <J>2)[5? 2 ]) = ^* 1 «* 2 (S' 1 0 2 ) 

> H^(S^) + H,p 2 (S°) > H (*0 + H(* 2 ), 

from which (8.33) follows. On the other hand, (8.36) and the subadditivity of the 
quantum entropy imply 

<g> O 2 MS 12 ]) - ^4>i®4«2(^12) 

< H((<b 1 ® d> 2 )[5i 2 ]) - (Si) - H* 2 (S 2 ) 

< [i/(d>i[5i]) - H^(Si)] + [i/(0 2 [5 2 ]) - fT<j> 2 (5 2 )] . (8.37) 

By using (8.12), we get 

C Z (<D 1 ( 8 ) <D 2 ) < C z (®i) + C Z (<D 2 ), 

i.e. ( 8 . 22 ). □ 

Moreover, the property (8.36) is closely related to the additivity of the /-capacity 
with input constraints. Let F be a positive operator in the input Hilbert space 3i 
of the channel, and E a positive constant. Consider the /-capacity under the linear 
constraint Tr SF < E on the input state S : 

C z (<b, F,E) = s [ max [// (0[S]) - //<*> (5)] . (8.38) 

Exercise 8.16. Assume that we have two channels d>i, d > 2 with corresponding 
constraint operators Fi, F 2 . In this case, inequality (8.36), along with (8.12), 
implies 

C z (<I>i<g><I> 2 , Fi<8)I 2 + Ii®F 2 , E) = max [C z (d>i, Fi, Ei) 

Ei+e 2 =e (8.39) 

+ C x (0>2, F 2 , E 2 )\ 
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The property of superadditivity (8.36) is closely related to the following hypothet¬ 
ical property of the entanglement of formation: let Jfi = Mf <8> M B and M 2 = 
M 2 <8> M B , and let S^ B be a state in M\ <8> M 2 , then 

E F (S? B ) > E F (S? B ) + E F (S 2 b ), (8.40) 

where the entanglement of formation is defined relative to the subdivision A\B. 
Namely, if property (8.40) would hold for all states S^ B , this would imply (8.36) 
for all channels d>i, <J> 2 and all states .ST 2 . while if (8.36) would hold for all states .ST 2 
and partial trace channels, (8.40) would hold for all states S^ B . 

The most complete conditional result in this direction is the following. 

Theorem 8.17. The conjectured properties (8.33), (8.22), (8.23) and (8.36) are glob¬ 
ally equivalent in the following sense: assuming that one of them holds true for all 
channels <t>|, <t> 2 , any other is also true for all channels. Moreover, they are globally 
equivalent to the superadditivity of the entanglement of formation (8.40) and to the 
additivity of the x-capacity with arbitrary input constraints. 

When (parts of this) theorem were proved, there was hope that it would help to 
reduce the proof of more complicated global conjectures, such as additivity of the 
/-capacity or superadditivity of entanglement of formation to a seemingly less com¬ 
plicated one - the global additivity of the minimal output entropy. However, with 
the disproof of the last conjecture, this theorem implies that none of the additivity 
properties holds globally (see Section 8.4 for more detail). 

We will not consider the proof of this theorem here; instead we discuss some re¬ 
stricted classes of channels for which the maximal property (8.36) can be proved. 

8.3.3 Some entropy inequalities 

Here, we obtain inequalities that allow us to prove, in some cases, the superadditivity 
of the convex hull (8.36) and hence the additivity of the minimal output entropy (8.33) 
and of the /-capacity (8.22). 

Consider a state S and a non-ideal measurement, described by a collection of oper¬ 
ators Ag that satisfy the normalization condition J2k ^k^ k = ^ ,so t * lat an outcome 
k occurs with probability ttg = Tr SA^Ag, resulting in a posterior state of the system 
Sg = AgSA £/ ng. The entropies of the initial and posterior states are related by the 
Lindblad-Ozawa inequality 


H(S)>J2 K kH(S k ). (8.41) 

k 

Proof. To prove this, take M = M 1 and consider a purification \ f) of the state S in 
Mi <8> M 2 . Then 


*kSk = T rx 2 ( A k ® ®/)*, 
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and S 2 = Tr Xl \xf){f\ = J2k n kS 2 , where 

*kS 2 = T *3tMk ® I)\f)(f\(Ak ® /)*• 

Using Theorem 3.10 twice and applying the concavity of the quantum entropy, 

H (5) = H (S 2 ) = ff(j>*S* 

' k 

A particular case of the above is the following inequality for composite systems. 
Let Si 2 be a state of the composite system 12 and let {|e^)| be an orthonormal basis 
in M 2 . Taking A k = /i £$ \e^)(e^\, one obtains 

ff(S 12 )>5>*tf(Si*), (8-42) 

k 

where = (e^\S\ 2 \efy. This can be used to prove a particular case of the addi¬ 
tivity conjecture. 

Proposition 8.18. The maximal property (8.36) holds in the case where <t>i = <t> is 
an arbitrary channel in M\ and <t >2 = ld 2 is the ideal channel in M 2 . 

Proof Since = 0, according to Exercise 8.3, we have to prove that 

tf(*®id 2 )(S 12 ) > (8.43) 

Let S 12 = YU 71 i $\ 2 be the optimal decomposition, i.e. 

«(*®id 2 )(5i 2 ) = (($ ® Id 2 )[S' 2 ]). 

i 

By using (8.42), we obtain 

5>///((<&®Id2)[S| 2 ]) > (8.44) 

i ik 

where ni k S\ k = {e k \S\ 2 \e^). Now, we have the decomposition of Si into the states 
S[ k with probabilities rr, rr,^ for channel <t>. Hence, the right hand side of (8.44) is 
greater than or equal to //<j>(Si). □ 

Proposition 8.19. The maximal property (8.36) (and hence the additivity of the min¬ 
imal output entropy (8.33) and of the %-capacity (8.22)) hold in the case, where 
d>! = O is an entanglement-breaking channel (see Section 6.4) and d> 2 = 'P is 
an arbitrary channel. 


) >X>*ff(s£) = X>*i/(S*). 

' V V 
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Lemma 8.20. Let tzj > 0, try = 1, and ,S£ be arbitrary states of the systems 
1,2. Then 

h(Ev S i'®sA > //(£*;*/)+£*,-//(S'). 

\ j / \ j / j 

Proof. We introduce the notation 

{S l2 )n = £ Xj s{ ® si ; (S !) w = £ 71 j s i ■ 

j j 

Now, using the additivity of the entropy, the inequality can be rewritten as 

ms 12 )„) -J2*j H ( s i ® si) > Has o,) -j2*jH(s{), 

j j 

which by identity (7.19) is equivalent to 

® si: (S 12 )„) > J^xjHisliiSih). 

j j 

However, the last inequality follows from the monotonicity of the relative entropy 
(with respect to partial trace in the second system). □ 

Proof Coming back to the proof of the proposition, assume that c£> is an entanglement¬ 
breaking channel, so that 4>[5] = J2j S(TrSM(, where {M/} is an observable in 
system 1. In this case, 


(<D ® Id 2 )[S 12 ] - £ Pj S{ ® S ] 2 , (8.45) 

j 

where pjSi — Tr _#>, S] 2 (M 7 < 8 > / 2 ) for an arbitrary state S 12 of the composite 
system. Taking the partial trace, we obtain, in particular, 

*[Si] = £ PjS{, s 2 = £ pj s 2- (8- 46 ) 

j j 

If 'P is an arbitrary channel in system 2, 

(<D ® V)[S 12 ] = £ PJ S ( ® *IS 2 1 (8-47) 

j 


We have 

H<tm(Si 2 ) = inf £rr,- H(($ ( 8 > T / )[5{ 2 ]), 


l 


(8.48) 
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where S 12 = n i'S \ 2 ' s an arbitrary decomposition. By writing relation (8.45) for 

we obtain 

(0 ( 8 ) Id 2 )[5{ 2 ] = ^2PijS[ j 8 ) S l 2 j , 
j 

along with the decompositions 

Si = J2 *»■ . s * = 12 ni pv s 2 • ( 8 - 49 > 

' ij 

By using (8.47), the lemma that we just proved, and (8.46), we obtain 


H ((O 8 *)[S\ 2 ]) = H PijS[ j 8 0[^']J 

> H j + Y'PijHinS?]) 

>H(^[si]) + j2pijH(ns l 2 j ]), 

j 

which in combination with (8.48) and (8.49) implies 

£*®*CSi2) - #<*>[^ 1 ] + ^[^l- 


□ 


8.3.4 Additivity for complementary channels 
Lemma 8.21. If S is a pure state, then it holds for any channel O, 

H(Q[S]) = 7/(d[S]), (8.50) 

where 0 is the complementary o/O. Therefore, if either of the properties (8.36), (8.33) 
holds for channels Oi, 0 2 , the similar property holds for the complementary channels 

$i, 0 2 . 

Proof. As follows from the definition (6.38) of the complementary channels, the 
states 0[S] and 0[S] are partial states of the pure state VS V*. Hence, by equal¬ 
ity (7.30), they have equal entropies. 

From (8.50) we conclude 

77(0) = 7/(6); H*{S) = Hq(S) 


for an arbitrary state S, due to the expressions for 7/(0), 77<j>(5) in terms of the 
output entropies for pure input states. Hence, the second statement follows. □ 
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Proposition 8.19 then implies 

Corollary 8.22. The maximal property (8.36) (and hence the additivity of the minimal 
output entropy (8.33) and of the x-capacity (8.22)) hold in the case where Oj = O is 
the channel (6.45), in particular, a dephasing channel, and O 2 = 4' is an arbitrary 
channel. 

Proposition 8.23. Let $2 be an arbitrary channel. The maximal property (8.36) 
holds for <t>|, <t >2 if & 1 is an orthogonal convex sum (see Definition 6.33) of the ideal 
channel and a channel such that the maximal property holds for 4 >d*) and (i> 2 . 

It follows that such a 4>i fulfills the additivity of the minimal output entropy (8.33) 
and of the /-capacity (8.22), with an arbitrary d> 2 - For example, this holds for the 
erasure channel, because it is an orthogonal convex sum of the ideal channel and 
an entanglement-breaking channel for which the property (8.36) was established in 
Section 8.3.3. 

Proof. Denote = < 7 Id ® (1 — q)<t>( {) \ Now, 

H(*M[S]) = qH(S ) + (1 -q)H(® (0) [S]) + h 2 (q), 

where h 2 (q) = —q log q — (1 — q) log(l — q) is the binary entropy and 

#*<*) (S) = (1 - q)H, 1 ,( 0 ) (S) + h 2 (q), (8.51) 

because the minimum in the expression for H$( o> (S) is attained for an ensemble of 
pure states Sj for which H(Sj ) = 0. For an arbitrary channel $2 we have 

4> (<?) ( 8 ) $2 = < 7 (Id ® 0> 2 ) © (1 - tf)(<I> (0) < 8 > <J> 2 ). 


Then 


Ff<l>W)®<l> 2 (5i2) > qH ldm2 (S l2 ) + (1 - q)H,+ h 2 (q) 

> qH$ 2 (S 2 ) + (1 — ?)[£*«»(Si) + H $2 (S 2 )] + h 2 (q), 

where we have used the superadditivity of ^id<s ><*>2 (*^' 12 ). //<j>(o> ( g ) $ 2 (Si 2 ) and the fact 
that //id(Si) = 0. By using (8.51), we obtain that the right hand side is equal to 
H$(c/)(Si) + H$ 2 (S 2 ). □ 

Corollary 8.22 implies that Proposition 8.23 holds with the replacement of the ideal 
channel by its complementary, the completely depolarizing channel. 
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8.3.5 Nonadditivity of quantum entropy quantities 

The quantum Renyi entropy of order p > 1 of a density operator S is defined as 

R P (S) = - i —logTrS p , (8.52) 

1 - p 

and in the limit p -> 1 the quantum Renyi entropies converge uniformly to the von 
Neumann entropy of a density operator S 

lim R p (S) = -TrSlogS = H(S). 

p -+1 

Defining the minimal output Renyi entropy of the channel as 

R p ($) = min Rp($[5]), 

Se<3 (M) 

one may ask for an additivity property similar to (8.33), namely 

R P (Si ® <& 2 ) = R P ($i) + R P (^* 2 ) • (8-53) 

Again, inequality < is obvious here. Note that the validity of (8.53) for some specific 
channels Oi, 4>2 and p close to 1 implies (8.33) for these 4>i, 4>2. This raised interest 
in the problem (8.53) in the hope that its solution could shed light on the additivity 
problems in quantum i nformation theory. In particular, proving it globally would 
imply, via Theorem 8.17, a global positive solution for all the additivity conjectures. 

However, a counterexample of the transpose-depolarizing channel [216] soon ap¬ 
peared: 

<D[S] = jL- (/ _ S T ) (8.54) 

for which (8.53) with = d> 2 = $ fails to hold for d — dim M > 3 and large 
enough p. 

Exercise 8.24. Prove the complete positivity of the map (8.54) by establishing 
the Kraus representation 

1 d 

= E (\ e j)( e k\-\e k ){ej\)S{\e k ){ej\-\ej)(e k \), (8.55) 

^ 1 j,k =1 

where {ej } is an orthonormal basis. 

A breakthrough in the negative solution of conjecture (8.53) came in 2007. Win¬ 
ter [225] had shown that in very high dimensions there always exist channels for 
which (8.53) does not hold for any p > 2. Next, Hayden [80] proved this for any 
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1 < p < 2. Winter shows that the additivity of the Renyi entropy with p > 2 breaks 
for the pair of channels 4>, <I> (complex conjugate in a fixed basis), where 

1 " 

*[S] = - T, U J SU j*’ (8-56) 

7 = i 

is the uniform mixture of unitary evolutions in M and n,d = dim M are large enough. 

The proof relies upon the two estimates. A simple but efficient upper bound for 
Rp (<t> ® d>) is given by the inequality 

Rp (O <g) d>) < R p (<1> <g> d>) [|£2) (£2|] < P i log n, (8.57) 

where |£2) is the maximally entangled vector in the same basis. Indeed, 

($®*)[|n)<n|] = ^ £ ( u j 

j,k= i 

= - |£2) <£2| H- 
n 

> -inwni, 

n 

where in the second equality we used the property (U <S> U) \Q) = |f2) for ar¬ 
bitrary unitary U. It follows that the maximal eigenvalue of the density operator 
(4> <g) d>) [|f2) (£2|], i.e. its operator norm, is greater than or equal to ^ 

1(0® ®)[|n)(n|]|| > X -. (8.58) 

Hence, 

R P ((O <8> O) [|£2) (fi|]) < Y~p = yT\ logn ’ (8 ' 59) 

and (8.57) follows. 

The next step is to consider random independent unitary operators Uj\ j = 1, ... ,n, 
distributed according to the normalized Haar measure on the group of unitaries in 
M. In this case, the expectation of each term in (8.56) is E UjSU* = I/d, since 
it is the only density operator that is invariant under unitary conjugations and one 
can expect that 0[S] is close to 7/ d in a probabilistic sense for n large enough. A 
key ingredient of the proof is the large deviation estimate for the sums of random 
operators, inspired by the classical Bemstein-Chemoff-Hoeffding inequality, which 
implies that the probability of the inequality 



<8> U k ) |£2) (Q\ (Uj <8> U k )* 

4 (Uj <8> U k ) |£2) <£2| (Uj <g> U k )* 

" J*k 


(8.60) 
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tends to 1 when d -> oo,n ~ d log d (compare this result with Theorem 10.34). 
From inequality (8.60) it follows that ||<1>[S]|| < and hence 

!±if (Ml, 

so that 


TV 3>[S] P = TrO^O^ -1 < ||<D[S]|| / ’~ 1 Tr<D[5] < f 


- 1 d 

Rp (O) =-max logTr OlSl^ > log-. 

p — 1 1 4“ £ 

This implies the probabilistic lower bound 

lim P [r p ($>) > log —} = 1 

d —»-oo ( 1 + £ } 

for random 0 of the form (8.56), with n ~ d log d . Taking into account that R p (d>) = 
R p (d>), and comparing with (8.57), one sees that with probability close to 1, 

R p (0 <g> <E>) < ~^Z\ log " < 21og (r^e) - ^ + (^) 

if p > 2,d —> oo ,n ~ d logJ. 

The estimate (8.58) can be generalized to an arbitrary channel O as follows: 

Lemma 8.25. Let ^ be a quantum channel from A to B which has the Kraus repre¬ 
sentation with dg components, and let \SIaa) be the maximally entangled vector in 
J(a <8> Ma- Then 


(<D®$)[|^)(^|]|| > = D. 


(8.62) 


Proof. We have 


dE 

(*® $)[|^)<^|] = ^ {V r < 8 > Vs) \^aa)(^aa\ (Vr < 8 > Vs)* 

r,s =1 


and 


<£2bb|(X® Y)\Q A a) 


1 

dAds 


T rY T X, 


for any operators X, Y acting from Ma to Mb ■ 
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By using these identities, we have 

||(0®<J>)[|^)(^|]|| > 


where the Cauchy-Schwarz inequality was used. □ 

This estimate is one part of proving that in very high dimensions there exist chan¬ 
nels that violate the additivity of the minimal Renyi entropies for all p > 1 (Winter 
and Hayden [83]). In this case, one considers the pair of channels 4>, <i>, where d> 
is a generic channel given by a general open system representation (6.22) with the 
random unitary evolution operator uniformly distributed over the group of unitaries. 
Moreover, ds = d£ = d —>■ oo, and dA ~ d l+l / p so that D ~ d x ! p ~ x . The 
upper bound 

Rp (O <8> <$>) < log d (8.63) 

then follows from (8.62) similarly to (8.59). 

The most difficult part of the proof is the probabilistic lower estimate 

lim P ik p (0) > log d — cl = 1, (8.64) 

d-*- oo 1 ' 

which is also based on a large deviation phenomenon, namely the measure concentra¬ 
tion, used to establish the concentration of the output entropy. This is closely related 
to the early observations that, given a random uniformly distributed pure state vector 
\tAB) of a composite system AB, the entropy of the partial state Sb of the smaller 
system tends to its maximal value log dp as dA -> oo, and hence Sb becomes almost 
chaotic while \^ab) is almost maximally entangled. 


(VbbK® ® $) | &Bb) 

dE 

E K° BB | ( V r <8> Ks) |S2^4^4)| 


r,5=l 


1 


dE 

E l' I ^r 

r,s =1 

dE 


dAds 


dAds 


1 


r = 1 


dA^B^E 

1 

dAdsds 


dE 


Tr V r V r 
r = 1 

|Tr Ia\ 2 = dA 


dsdg ’ 
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The relations (8.63) and (8.64), together with R p (4>) = R p (4>) imply that 
R P ($»$)< R p (<D) + R p (d>) 
with probability close to 1. 

Notably, these estimates for p > 1 are not strong enough to imply the violation 
of additivity for p —> 1 and the case of the minimal von Neumann entropy requires 
additional efforts which were accomplished by Hastings [76]. His original argument 
was improved and simplified in subsequent works [64], [30], and [11], Following [30], 
consider the random unitary evolution channels with d% = d —> oo, and d£ = 
dA d. In this case D = \/d, and Lemma 8.25 implies that the density operator 
(0 (g) <f>) [j 12^) (12^1], acting in the space of dimensionality d 2 , has an eigenvalue 
greater than or equal to l/d. 

Exercise 8.26. Let P = [Ai,..., X d 2 \ run through all probability distributions, 
such that Ai > \/d. In this case, 

max H(P) = log d --— log (d + 1) < 2 log d — 

P d d 

where the maximum is attained for the distribution P such that A i = \/d and all 
other probabilities are equal to 1 /d(d + 1). 


It follows that 


7/(<D®4>) < //((d>® $)[|^)(^|]) <21ogJ- 


log (d/e) 


(8.65) 


which gives the “easier” part of the argument. The difficult part is again the proba¬ 
bilistic estimate, exploring at full strength the phenomenon of entropy concentration. 


P j//(0) >logd--j \ >0 


for d, dA/d large enough, where C is a positive constant. Together with (8.65), this 
proves the existence of channels violating the additivity of the minimal von Neumann 
entropy. 

Although, when combined with Theorem 8.17, this gives a definite negative answer 
to all global additivity conjectures, including the one which concerns the /-capacity, 
several important issues remain open. All the above proofs use the technique of ran¬ 
dom unitary operators or random states and, as such, are not constructive. They only 
provide evidence for existence of counterexamples but do not allow us to actually pro¬ 
duce them. Attempts at providing estimates for the dimensions in which nonadditivity 
can happen has so far led to overwhelmingly high values. The detailed estimates made 
in [64] gave J «s 3.9 x 10 4 , d^ ~ 7.8 x 10 32 , breaking the additivity by a quantity 
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of the order of 10 -5 . While this does not exclude the possibility of better estimates, 
perhaps to be based on a different (but yet unknown) approach, it casts doubt on find¬ 
ing concrete counterexamples by computer simulation of random unitary channels. It 
remains a mystery what happens in realistic dimensions. Perhaps, for some unknown 
reason, the additivity still holds generically or its violation is so tiny that it cannot be 
caught by numerical simulations. 

From this point of view, the following explicit construction is of interest. Follow¬ 
ing [73], we shall consider an explicit example of a simple channel closely related 
to (8.54), for which the minimal Renyi entropies are nonadditive for all p > 2 and 
sufficiently large d. This more recent explicit construction thus achieves the same 
goal as the first Winter’s proposal (8.56). However, it is not clear if it could be ex¬ 
tended to the most interesting range p > 1. 

Exercise 8.27. By using formula (6.42) and the Kraus representation (8.55), 
show that the complementary channel of (8.54) is 

Ms] = Jd^T) P ~ {S ® /2)p ~’ (866) 

where P- = — 2 ~ f is the projector onto the antisymmetric subspace M- of 
<g) which has dimensionality d( - d ~ x ^ , and F is the flip operator acting as 

F (\fi) <g> \f 2 )) = 1^2) <8> |V^i) (8.67) 

By Lemma 8.21, this channel shares the additivity properties with the chan¬ 
nel (8.54). 

Now consider the completely positive map ^ 2 — O*, where <E>* is the dual to the 
channel (8.66): 

S 12 —> ^—-"—^*[512] = TV 2 P-S 12 P-. (8-68) 

Its restriction to operators with support in the subspace is trace-preserving and 
hence it is a channel, which we denote 0_. For this channel, the input dimensionality 
is dimJf- = , while the output dimensionality is d and the environment 

dimensionality also is d, because (8.68) involves the partial trace with respect to the 
second copy of Jf. Thus, we have d^ = d ^ d ~ x \ dp = dp = d, whence D = ^ 2fj 1 - 
in (8.62), so that 

TV(<D®0)[|^)(^|] p > ||(<D® > DP, 

R„ (O <g> O) < — V — 

P — 1 


and 


log D. 


(8.69) 
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The channel (8.68) breaks the additivity of the quantum Renyi entropies for p > 2 : 

Proposition 8.28. For any p > 2 and d > (\ — 

R p (0_ (8) 0_) < R p (0_) + R p (O-). 

Proof Let us show that 

4 ($-) = !■ (8.70) 

Let S 12 = be a pure input state of the channel (8.68), where the vector 

\fi 2 ] e has the Schmidt decomposition \\f\ 2 ) = J2j=i \ e J ® hj). From 
the definition (8.68) of the channel, 


d 

*-[Si2] = Si = £A J I ej){ej 

7 = i 


We have 


A j = \{fn\ej 8 hj)\ 2 

< (ey 8) hj\P-\ej 8 Ay) 

= \{ej 8> hj\{1 12 — F)\ej 8 Ay) 

= \{l-\{e j \hj)\ 2 )< 1 -, 

with the equality in the last inequality if (ey | Ay ) = 0. Therefore, 

II $-[512)11 = max Ay < 

J 2 


Hence, similar to (8.61), 

Tr<S>-[S l2 ] p = Tr O^S^O-^]^ 1 < HO-Mf 1 Tr 0_[S 12 ] < 

and Rp(d>_[Si 2 ]) = log Tr O-fS^P > 1. The equality is attained for |^ 12 ) = 
(|e) 8 |A) — |A) 8 |e)), where (e\ A) = 0. This proves (8.70). 

Now, (8.69) together with 0 = 0 implies 

R p (0_ 8 0_) < log <2 = R p (0_) + Rp (0_). 

p — 1 2d 


provided p, d satisfy the conditions of the proposition. 


□ 
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8.4 Notes and references 

1. The memoryless channel model is the simplest, yet is basic in information theory. 
Already for this case, the coding theorems are not at all trivial. At the same time they 
give a basis for considering more complicated and realistic scenarios, taking into ac¬ 
count memory effects. Moreover, the effects of entanglement are demonstrated in the 
memoryless case in a most spectacular way. Therefore, in our presentation throughout 
the book we concentrate on the memoryless configurations. Proposition 8.2, proved 
in the paper [98], was later generalized to memory channels by Kretschmann and 
Werner [141]. A generalization to arbitrary channels in the spirit of Han and Verdu’s 
information spectrum approach was given by Hayashi and Nagaoka [79], [78]. 

2. The fruitful connection between the /-capacity and the relative entropy, in par¬ 
ticular the maximal distance property, was observed by Schumacher and Westmore¬ 
land [179], 

3. One of the first mentions of the additivity problem was in the paper of Bennett, 
Fuchs and Smolin [22], The formulation in terms of the four capacities was given by 
Bennett and Shor [23]. Remarkably, as shown by Shor [192], there is an intermediate 
capacity, which is obtained by allowing adaptive measurement on the output, and 
which lies strictly between Ci,i and Ci >00 for the “lifted trines” channel. 

Theorem 8.17 consists of several implications, some of which are individual, i.e. hold 
for any fixed pair of channels, while other only hold globally, e.g. “if the mini m al 
output entropy is additive for all channels, so is the (constrained) /-capacity”. Mat- 
sumoto, Shimono, and Winter [157] observed the relation between a constrained ver¬ 
sion of the /-capacity and the entanglement of formation. Audenaert and Braun- 
stein [13] demonstrated the relevance of the methods of convex analysis, while the 
most profound global implications were established by Shor, who developed an in¬ 
genious extension construction that allows one to strengthen the individual channel’s 
properties. See the paper of Shor [193] for a detailed account as well as for further ref¬ 
erences. Additional information on the equivalences of various additivity properties 
can be found in Chapter 9 of Hayashi’s book [78], The fact that global validity of the 
property (8.23) implies additivity (8.22) for all pairs of channels <l>i, 4>2, (and a simi¬ 
lar implication for the minimal output entropy) was proved by Fukuda and Wolf [65]. 

The relation between the additivity properties of complementary channels was noticed 
and studied by Holevo [103] and by King, Matsumoto, Natanson, and Ruskai [132]. 
Proposition 8.23 is proved in the paper of Holevo and Shirokov [109], where a math¬ 
ematical description was given to the channel extension construction, and the super¬ 
additivity conjecture (8.36) and its equivalence to the additivity of constrained /- 
capacity were studied systematically. 

The Lindblad-Ozawa inequality was established in its final form in [162], The valid¬ 
ity of the additivity conjecture for entanglement-breaking channels was established 
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by Shor [190]. Other notable results include King’s proofs of conjectures (8.22) 
and (8.33) for unital qubit [129] and depolarizing channels, as well as their com¬ 
plementary channels [103], [132]. Most of these results can be derived by proving 
the additivity of the minimal output Renyi entropies (8.53) for all values of p. A 
significant role in these proofs is played by the Lieb-Thirring inequality, see [27], 
Section X.2. For a detailed account of these and other partial results on the additivity 
problem, see the survey [104]. 

The counterexample of a transpose-depolarizing channel was proposed by Werner and 
Holevo [216]. A breakthrough came with the results of Winter [225] and Hayden [80], 
who had shown that the additivity of the minimal Renyi entropy with parameter values 
p >2 and 1 < p <2, correspondingly, does not hold in sufficiently high dimensions. 
The phenomenon of entropy concentration was explored in [82], basing on a noncom- 
mutative generalization of Bemstein-Chemoff-Hoeffding inequality. Basing himself 
on the methods of these works, Hastings [76] considered mixtures of random uni¬ 
tary operators and sketched a proof of the violation of additivity for the minimal von 
Neumann entropy for large n, n/d. By Theorem 8.17, this implies the existence of 
counterexamples to all forms of the additivity, as well as to equality (8.24), as follows 
from [65], The original argument of Hastings was improved and simplified in the sub¬ 
sequent works of Fukuda, King, and Moser [64], and Brandao and M. Horodecki [30], 
Aubrun, Szarek, and E. Werner [11] pointed out a profound connection of these re¬ 
sults to Dvoretzky-Milman’s Theorem in the asymptotic theory of finite-dimensional 
Banach spaces. The constructive counterexample to the additivity of minimum output 
Renyi entropy of quantum channels for p > 2 is due to Grudka, M. Horodecky, and 
Pankowski [73]. 

An important corollary of these negative results is that, in general, the classical ca¬ 
pacity C (<t>) ^ C x (<t>). By definition, C — nC (<t>). However, whether the 

classical capacity C (<f>) can be nonadditive for a pair of different quantum channels 
< 1 > 1 , < 1>2 remains an open question. 
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Entanglement-assisted classical communication 


9.1 The gain of entanglement assistance 

The construction of the superdense coding protocol (Section 3.3.3) allows for gen¬ 
eralization to the case of non-ideal channel <f> between arbitrary finite dimensional 
quantum systems A and A'. Consider the following protocol of the classical informa¬ 
tion transmission through the channel <f>. Systems A (transmitter) and B (receiver) 
share an entangled (pure) state Sab , which is distributed to them with a procedure of 
the kind described in Section 3.2.1. We shall assume that dim Ma < dim Mb- System 
A does some encoding of classical messages i, arriving with probabilities jr* into op¬ 
erations (channels) G‘ A , acting in Ma- These operations are applied by A to its part of 
the state Sab . so that the state of the system A B is transformed into (G l A <8>Idfl)[SUfl]. 
After that, A sends its part of this state through the channel <t>, thus driving the whole 
system AB into the state (O o ® Id#)^#], accessible to B. Next, B makes a 
measurement in the space Ma’ b in order to extract the classical information. In fact, 
block encoding is allowed, so that the whole of this description refers to the n-th ten¬ 
sor degree of Ma'b ■ We are interested in the classical capacity of this protocol, which 
is called the entanglement-assisted classical capacity of the channel <!>. 

The maximum over measurements of B can be evaluated using the Coding Theo¬ 
rem 5.4 for the classical capacity. First, we introduce the quantity 

CiP(0)= sup x({^;(<b®Id B )[S‘ B ]), (9.1) 

*i'8‘A'SAB 

which takes into account measurements in the system A' B. Using the channel n times 
and allowing arbitrary collective (entangled) measurement on B ’s side, one gets 

C^m = (9.2) 

Now, the full entanglement-assisted classical capacity is 

C efl (<3>) = lim -CiP(<3>®"). (9.3) 

n 

The following theorem gives a simple “one-letter” expression for C ea (d>) in terms of 
quantum mutual information, 
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Theorem 9.1 (Bennett, Shor, Smolin and Thapliyal [24]). The entanglement-assisted 
classical capacity of the channel <t> is equal to 

C ea m = max I(S A , 4>) = max [H(S A ) + H(^(S A )) - H(S A ; <D)]. (9.4) 

Sa Sa 

The proof of this theorem will be given in the following sections. Here, we wish to 
compare entanglement-assisted and (unassisted) classical capacities. 

Example 9.2. Consider the quantum erasure channel <t> p . We have 

#0MSU]) - (1 - P)H{S A ) + h 2 (p). 

Using the fact that the complementai'y channel (see Exercise 6.35) to¬ 

gether with Definition 7.28 of the entropy exchange, we have H(S A : <t>)= 
//(4>[5U]) = pH(S A ) + h 2 (p). Hence, 

I(S A , $„) = H(S a ) + (1 - p)H(S A ) - pH(S A ) = 2(1 - p)H(S A ), 

whence C ea (<P p ) = 2(1 —p) log d, which is as twice as large as the classical capacity 
of the erasure channel. 

Proposition 9.3. If the channel <t> is covariant, the maximum in (9.4) is attained for 
an invariant state S A . In particular, if 0 is irreducibly covariant, it is attained on the 
chaotic state. 

Proof Let the channel <t> satisfy the covariance condition (6.48). In this case, 

I(V g A S A V g A *, <D) = I(S A , O). (9.5) 

This follows from the expression (7.52) for the mutual information, unitary covari¬ 
ance of the input and output entropies, and the corresponding property of the entropy 
exchange, 

H(V g A S A V g A *, <D) = H(S A , <D). (9.6) 

To prove this last property, consider formula (7.47) for the entropy exchange and note 
that purification of the state V A S A V A * is given by the vector (V A <8> I R)\f A R). Sub¬ 
stituting this into (7.47) and again using the covariance of 4> and the unitary invariance 
of the entropy gives (9.6). 

Now, let S A be an optimal state for the channel 4>. Consider the V A -invariant state 

1 1 g€G 

By Proposition 7.31, the function S I(S, 4>) is concave, and therefore 
I(S, $) > Y, I(V g A S A V g A *, <S>) = I(S A , O), 

|tj| g€G 

where the last equality follows from (9.5). □ 
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Example 9.4. Consider the depolarizing channel (6.49). By using unitary covari¬ 
ance, one sees that I(S, <t>) achieves its maximum at the chaotic state S = for 
which H(S) = //(<t>[S]) = log <i. The entropy exchange H(S, <t>) is given by the 
relation (7.51) whence 


C e a(^) — log d 2 + 



(9.7) 


This should be compared with the unassisted classical capacity C, which is equal to 
C x , given by the formula (8.19). 

One can see, in particular, that the gain of entanglement assistance C e a / C —> d +1 
in the limit of strong noise p —> 1 (note that both capacities tend to zero!) Moreover, 
taking the maximal possible value p = , we obtain 


C ea = log 


d 7 


C = C * = STT 108 


d 2 -V 
d d 


+ 


log- 


d + 1 d + 1 a 2 — 1 


The ratio C ea /C increases monotonically from the value 5.08 for d = 2, tightly ap¬ 
proaching the asymptotic line 2 (d + 1) as d grows. Recall (Proposition 6.40) that the 
depolarizing channel is entanglement-breaking for all p > d/{d + 1). Nevertheless, 
Cea > C also in this case. In the next section we will show that this inequality holds 
generically even for q-c channels. 


Here, entanglement again appears as a “catalyst” of the transmission of classical 
information through the quantum channel - it may indefinitely increase the classical 
capacity of noisy channels, while being unable to transmit the information on its own. 

I Exercise 9.5. Show that for a qubit unital channel of the form (6.59) with A 
given by (6.64), 


C ea (®) = I(S, 4>) = 2log 2 + ^ (i y log p,y. 

y=0,x,y,z 


Hint: The entropy exchange H(S, 4>) can be computed by using Proposition 7.29 
and the Kraus representation (6.64), resulting in 

H(S,$)=- £ /^log/V 

y=0,x,y,z 

Compare this with the capacity C x ( $), given by formula (8.21). 

In general, there are simple inequalities relating entanglement-assisted and unas¬ 
sisted classical capacities: 

C x m < c ea (d>) < logd + C x (<s>). 


(9.8) 
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Take S 4 to be a mixture of arbitrary pure states Sj with probabilities pj . In this case, 
(7.50) implies 

nsA,<s>) < + 

V y / \ J- / j \ / 

Taking the maxima in both sides and using (9.4), we obtain the second inequality 
in (9.8). 

Although the entanglement-assisted protocol uses additional resources compared 
to unassisted transmission of classical information, the first inequality in (9.8) is not 
quite obvious, because of the special nature of the encoding procedure used by A. To 
avoid this, let us show that the definition of cJP(3>), and hence of C ea (<$>), can be 
formulated without explicit introduction of the encoding operations 8 A , namely, 


CiP(O) = sup 


*(£ 

i 

^p/,//((<h®Id fl )[5' fl ]) 


(9.9) 


where £5 is the collection of families of the states that satisfy the condition 

that their partial states S l B may not depend on i , S l B = S B - 
The first inequality in (9.8) then follows by taking S l AB — S l A <g> Sb ■ To prove (9.9), 
it is sufficient to establish the following fact. 


Lemma 9.6. Let {S l AB ) be a family of the states satisfying the condition S‘ B = Sb- 
Then there exist a pure state Sab and encodings S A such that 


S l AB = (8 l A ®ld B )[SAB}. (9.10) 

Proof For simplicity assume first that S B is nondegenerate. In this case, 

d 

Sb = £A*| e £)< e £|, 

k—\ 

where Xk > 0 and {f B )} is an orthonormal basis in Mb- Let {\e£)} be an or¬ 
thonormal basis in M 4 . For a vector \f A ) — Ylk= 1 c k\ e k) we denote \f B ) = 
1 t'k\ e k)- The map \f A ) —> \f B ) is an anti-isomorphism of Jf A and Mb- Set 

d 

Wab) = ^\4) ® \ e k >. 

k= 1 
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so that Sab — IVmbHVmbI, and define encodings by the relation 

s' A [\f A )(<P A \\ = (* B \Sb 1/2 S 1 ab S - 1/2 \$ b >, \f A ), I 4> a ) e X A . 

In this case, one can check that S l A are indeed channels fulfilling the formula (9.10). 

In the case where Sb is degenerate, the above construction should be modified by 
replacing Sg 1 ^ 2 S l AB Sg 1 ^ 2 in the above formula with y/S B S l AB + Pg’ where 
S B is the generalized inverse of Sb and P B is the projection onto the null subspace 
of Sb- □ 

9.2 The classical capacities of quantum observables 

In this section, we wish to compare the classical capacities C, C ea for the special class 
of q-c channels, when it is possible to establish a necessary and sufficient condition 
for their coincidence. Let M — {M y ;y e 3/| be a quantum observable in a Hilbert 
space M. Consider the measurement channel 

M[S] = ^ i \ey)(e y \TtSMy. (9.11) 

y 

Since any q-c channel is entanglement-breaking, its classical capacity is given by the 
one-letter expression 


C(M) = C X (M) — max S (;r; M ), 

where n — {n x , S x } is an ensemble that assigns the probabilities ir x to states S x and 

J(7r-,M) = H(PsJ-Y l ^xH(P Sx ) 

X 

is the Shannon information between the input x and the output y. Here, S n = 

xS x , Ps = {Tr S M y | is the probability distribution of the measurement out- 
x 

comes and H (P) is the Shannon entropy of a probability distribution P. This can be 
rewritten as 


C(M) = max 
s 


H (Ps)~ min y y x H{P Sx ) 

TC'.Sjz — S x 


(9.12) 


where the minimum is over input ensembles n with the fixed average state S n — S. 

Next, consider the entanglement-assisted capacity which, according to Theorem 9.1, 
is given by the formula 

C ea (tAf) = max I (S ', M), 
s 


(9.13) 



Chapter 9 Entanglement-assisted classical communication 


185 


where 


I(S; M) = H(S) + GM[S]) - H(S, M) 

is the quantum mutual information. Here, H(-) is the von Neumann entropy and 

H(S,M) is the entropy exchange. Let p y — Tr SM y and V y be an operator satis- 

i /9 V sv* 

fying M y = V* V y , for example, V y = M y . Then the density operator y p y = 
S (y\M) can be interpreted as posterior state of the measurement of observable M 
with instrument S — > {V y SV*} in state S. We use the following formula [186]: 

I(S]M) = H (5) -]T(Tr SM y ) H (S (y\M)). (9.14) 

y 

Indeed, writing the Kraus decomposition 

M[s} = J2\ e y^ h J\ v y sv *\ h j^ e y\ 

yJ 

and applying Proposition 7.29, we obtain 

H(S,M) = H^VySV* ® \e y ){e y \) 
y 

= "(E p y S{y\M)®\e y ){e y \) 
y 

= H(P s ) + Y,Py H ( s (y\ M )) 

y 

while H(M[S]) = H(Ps), hence (9.14). The entanglement-assisted classical capac¬ 
ity of the channel M follows by substituting this expression into (9.13). 

Let us now describe the ensemble-observable duality. If M = { M y ; y e V) is a 
quantum observable and n — {p x , S x ',xe X} an ensemble of quantum states, then 

Pxy = Px Tr S x M y 

is a probability distribution on X x V. On the other hand, 

Pxy = p y TrS y M' x , 

where, denoting S n = ^ p x S x , we have p' y S' y = S„ 1 ^ 2 M y S K ] so that p' y — 

X 

Tr S n M y and M' x = p x S n ~ 1 ^ 2 S x S n ~ 1 ^ 2 . Here M' n = {M x ,x e X} is the new 
observable, and n' = {p y , S y \ y € 2/} is the new ensemble. Therefore, the Shannon 
information between x, y is 

( s 1/2 m s 1 / 2 ! 

Introducing the ensemble n' s = < Tr S M y , — j^sMy —f’ ^ or a given state S, we 
have the following duality relations, 
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Proposition 9.7. Denoting A{n' s ) = max/vf" J(ir' s , M") the accessible information 
and x (n's) — 7 /( Jf y PyS'y) ~ J2 y PyH (S y ) the x-quantity of the dual ensemble 
it' s , we have 


C(M) = maxJ(n,M) = maxA(jr£), (9.15) 

ft s 

C ea (M) = max I(S; M) = max x (it's) ■ (9-16) 

s s 

Proof Relation (9.15) follows from 

max J?(jr, M) = max max S{tz,M) 
n S tc:Sjt=S 

= max max S(it' s ,M' n ) 
s M' n :S n =s 

= max max $ (it[-, M") 

S M" 6 

= max Afiz's), 

where in the third equality we used the fact that for any observable M" — \M y } there 
is an ensemble it = | Tr s J^y ^ S uch that M" = M' n for this ensemble. 

Notice that Jf y PySy — 5”, and H(S' y ) = H { ^ y ^ y j, where V y is an arbitrary 
operator that satisfies M y = V* V y , because the operators V y S V* = V y S l ^ 2 S x ^ 2 V* 

and S 1/2 V*V y S 1/2 = S 1/2 M y S^ 2 = p' y S y are unitarily equivalent via polar de- 

V sv* 

composition and hence have the same spectrum. The density operator yy = 

S ( y\M) is the posterior state of the measurement of the observable M with the in¬ 
strument {V y } in the state S. Thus, the /-quantity of the dual ensemble is 

X (*s) = H(S)-J2p'y H ( S (y\ M » = ns; M) (9.17) 

y 

and hence, in addition to (9.15), we have, via (9.14), the duality relation (9.16). □ 

Let us now recall the bound of Theorem 5.9 

A(n) < H (E Py Sy \-Y J PyH(S y )^x{n) ( 9 - 18 ) 

V j ' y 

with the equality attained if and only if the operators p y S y all commute. Applying 
this to the dual situation, we obtain 

A{7t's) < H \ Yp' y S'\-Y.P'y H ( S 'y) = * K) • 

V y / y 
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In this case, the necessary and sufficient condition for the equality becomes 

S 1 / 2 MySMy'S 112 = S 1 / 2 My,SMyS 112 (9.19) 

for all y, y'. Therefore, by (9.15), (9.16), the necessary and sufficient condition for the 
equality C ea (M ) = C(M ) is that condition (9.19) is fulfilled for a density operator 
S maximizing the quantity (9.17). In particular, in the case of a covariant observable 
M g = VgMoV*, where V g is an irreducible representation of a symmetry group G, 
Proposition 9.3 implies S = I/d and condition (9.19) reduces to the fact that all 
components of the observable commute, i.e. it is essentially a classically randomized 
observable. 

Consider the case of an overcomplete system M y = i/g) (VG I i n ^-dimensional 
Hilbert space M, where S = I/d. The corresponding ensemble of pure states is 
n' s = n = |and x (n) = log d = C ea (M) (notice that this is 

also equal to the classical capacity of the c-q channel y —> s i nce th' s I s the 

maximal possible value). Condition (9.19) amounts to 

\fy)(fy\fy')(fy'\ = \fy')(fy'\fy)(fy\- 

We can always assume that the vectors \\fr y ) are all pairwise linearly independent. In 
this case, the last condition is equivalent to the fact that they form an orthonormal 
basis. 

Thus we conclude that C ea (M) > C(M), unless {i/fy} is an orthonormal basis. 
Now, let 0 be the unit sphere in Jf and let v(d6) be the uniform distribution on 0. 
To avoid notational confusion with the differential, we denote in the argument below 
the dimensionality with d A = dim Jf. According to relation (6.55), the family of 
unit vectors {| 9)\9 € 0} can be considered a continuous overcomplete system. The 
corresponding measurement channel M maps the density operator S to the probability 
distribution on 0 : 

M : S —► d A {9\S\9)v(d9). (9.20) 

It can be shown that, similar to the discrete overcomplete systems considered above, 
C ea (M) = log dA ■ Let us show that 


“A 1 

C(M) = \ogd A -\ogeJ2l- ( 9 - 21 ) 

k=2 

Thus, for d A —> 00 we have C(M) —> loge (1 — y), where y 0.577 is Euler’s 
constant. At the same time, C ea (M) = log d A —> 00 . 

All outcome probability distributions (9.20) are absolutely continuous with respect 
to v(d9), and hence we can use the differential entropy 

Kps) = - f Ps(0)logps(9)v(d9), 

J& 
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where ps(0) = d^ (9\S\9), to get the continuous analog of formula (9.12) 


C(M) = C X (M) = max 

s 


h(Ps)~ 

Tt — O 


(9.22) 


The channel M is covariant with respect to the irreducible action of the unitary group, 
in the sense that 


M ( USU *) = d A {U*9\S\U*0)v(d0). 

Therefore, similar to (8.15) 

a- 

But is the uniform distribution over 0, with density p(0) = 1. Hence, 

h (jj)) = 0. On the other hand, 

-h(M{\9')(9'\)) = f d A \(9\9')\ 2 log[d A \(9\9')\ 2 ]v(d9). (9.24) 

By the unitary invariance of v, this quantity is the same for all 9’. Hence, there is no 
need for minimization in (9.23). For its computation, we use relation (6.57). With 
this, (9.24) becomes 

- ( d A r 2 ) log (d A r 2 ) d(l - r 2 ) dA ~ x = j [^(1 - u )] log [d A ( 1 - u )] d ( u dA , 

where u = 1 — r 2 , which, after integrations by parts, gives (9.21). 

9.3 Proof of the Converse Coding Theorem 

Now, we finally come to the proof of Theorem 9.1. 

To prove the inequality 


mmh(M(\9'){9'\)). (9.23) 

0 ' 


C X (M) 


= h (m f 


C ea ($) < max I(S A , O), (9.25) 

Sa 

we first prove that 

Q ( P(<3>) < max I(S A , O). (9.26) 

Sa 

Let us denote by 6 l A the encodings used by A. Let S A b be the pure state, initially 
shared between A and B. In this case, the state of the system A B (resp. A) after the 
encoding is 

S\ B = ( s a ® Idfl)[^ fi ], = g'[54]. 


(9.27) 
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Note that the partial state of B does not change after the encoding, S l B = S B . We will 
prove that 

® Mb)[SAb]) -£>,// ((<& ® Wb)[SAb]) < / ( £ msi, o). 

(9.28) 

The maximum of the left hand side with respect to all possible iti , S l A is just 
Cga from which (9.26) follows. 

By using the subadditivity of the quantum entropy, we can evaluate the first term in 
the left hand side of (9.28) as 

#+ H(Sb) = h( 9 J2 niS A ) + Y^mH(S B ). 

^ i ' ^ L i J ' i 

Here, the first term already provides us with the output entropy from I S l A ; O). 

Let us proceed with the evaluation of the remainder 

2>i [^( 5 «) - H ((* ® Wb)[ 5A b ])] • 

i 

We first show that the term in squared brackets does not exceed H(S A ) — 
//((O <8> Id/jOf'S'Aui]). where R l is the purifying (reference) system for S A , and 
S l ARi is the purified state. To this end consider the unitary extension of the encoding 
8 l A with the environment £j, which is initially in a pure state, according to Theo¬ 
rem 6.18. From (9.27) we see that we can take R l = BE, (after the unitary interac¬ 
tion, which involves only AEi). In this case, again denoting by primes states after the 
application of the channel <!>, we have 

H(S b ) -//((<*> ® Wb)[SAb]) = h ( s b ) - H{S l A ' B ) = -Hi(A'\B), (9.29) 

where the lower index i of the conditional entropy refers to the joint state S l A , B . Sim¬ 
ilarly, 

H(S’ a ) - «((® 8 Id K ,)[S' B ,]) = H(S‘ r ,) - 

= -hm'\r‘) = -h,(a’\be,), 

which is greater than or equal to (9.29), by the monotonicity of the conditional entropy. 
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Using the concavity of the function Sa -* H{Sa) — H ((O ® Id/?)[5^/?]), to be 
shown below, we get 

I> [h{S 1 a ) - H ((* ® Id*,)^,-])] 

i ^ i ' 

-//((<t> ® Id*)[£u?]) , 

where Sar is the state purifying Yli 71 i ^a w bb a reference system R. 

To complete this proof, it remains to show the above concavity. By introducing the 
environment E for the channel <!>, we have 

H(S A ) - H ((<& ® Id*)^*]) = H(S r ) - H (S A 'r) 


= H(Sa’E') - H{S E >) = H{A'\E') 


Now, H(A'\E') is a concave function of Sa'E' by Corollary 7.27. The map Sa -* 
Sa'E ' is affine and therefore H(A'\E') is a concave function of Sa . 

Applying the same argument to the channel O®" gives 

< max I (Sa, O®"). (9.30) 

Sa 

Then, from the subadditivity of the quantum mutual information (property iii. in 
Proposition 7.31) we have 


max/(Si 2 , Oi <8> $ 2 ) = max/(Si, <l>i) + max /(S 2 , $ 2 )» 

S 12 Si S2 

whence the remarkable additivity follows 

max I(SAn , G>®”) = n max I(Sa, O). 

S A n S A 

Therefore, we finally obtain (9.25). □ 


9.4 Proof of the Direct Coding Theorem 

Here, we prove the inequality 


C ea (<&) > max I(Sa, $). 
S A 


(9.31) 
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First, let us show by generalizing the dense coding protocol, that 

> I , <J>®") (9.32) 

for an arbitrary projection P in Sif n . 

Indeed, let P = YlT-i \ e k)( e k\ > where {e^k = 1 = dimP} is an or¬ 

thonormal system. Consider the discrete Weyl system {W a p ;a,/3 = 0 ,... ,m — l}, 
defined by the relations (6.50) on the subspace Sip = PSif n . 

Since dim Si a < dim Sip, we can assume that Si a c Sip. Let 

1 m 

Wab) = -J= ^2 \eic) ® kit). 

k=i 


Exercise 9.8. Show that the system {{W a p ® Ib) Wab)',u>P =0, 
is an orthonormal basis in Sip ® Sip . In particular, 

m —1 

J2 ( W “P ® J b) \*AB)Wab\ {Wap ® Ip)* = P®P. 

a,fi=0 

The operators W a p will play a role similar to the Pauli matrices in the dense coding 
protocol for qubits. Take the classical signal to be transmitted as i = (a, fi), with 
equal probabilities 1/m 2 , the entangled state \ ^ab){^ab\, and the unitary encodings 
S l A [5] = W a pSW* p . Now, we have 

c«>(*®”) Id»)®" [s;S]) - k £ » (<® ® W, )*” [s“§]). 

V a,fi / a,fi 

where S AB = (W afi ® Ip n )\f A B){fAB\(W a p ® lf n )*. By (9.33) the first term in 
the right hand side is equal to //((<!> <8> Idg)<8> ^-]) = //(^) +Since 

Sab ' s a purification of the entropies in the second term are all equal to <1>). 
By the expression for the quantum mutual information 

I(S A , <s>) = H(S A ) + //(<&[&]) - H{Sa-,<S>) 

with S A — this proves (9.32). For future use, note that the last term in the quantum 
mutual information - the entropy exchange - is equal to the final environment entropy 
H{S' e ) = H(<b [S^]), where O is the complementary channel from the system space 
Si A to the environment space Sip. 


..., m — 1} 

(9.33) 
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Now let Sa = S be an arbitrary state in J£a and let P n,s be the strongly typical 
projector of the state S®” in the space X® n , defined below. We shall prove that for 
an arbitrary channel 'I' from -Ka to a possibly other Hilbert space M the following 
relation holds 


1 / T P n ’ S 

lim lim -H\ 'I'®" -,— 

S-^on^-oon \ dim P n ’° 


= urns]). 


This would imply, by the above expressions for the mutual information and the en¬ 
tropy exchange, that 


j / pn,8 ^ 

lim lim -/ -, <D®" = I(S, <D), (9.34) 

S-^on^-oon \dimP”’ 5 


and hence, by (9.32), the required inequality (9.31). 


Definition 9.9. Let us fix small positive 8, and let Ay be the eigenvalues, \ej) the 
eigenvectors of the density operator S. In this case, the eigenvalues and eigenvectors 
of S® n are Xj =X h --X jn , \ej) = \e Jt ) ® ••• ® \ej n ), where / = (ji,...,j n ). 
The sequence J is called strongly typical if the numbers N( j\J) of times that the 
symbol j appears in J satisfy the condition 


NOV) 


<S, j — 1,... ,d, 


and N(j\J) = 0 if Ay - 0. Let us denote the collection of all strongly typical 
sequences as T n ’ S . The strongly typical projector is defined as the following spectral 
projector of the density operator 5®”: 

P n ’ S = £ \ej)(ej\. 


Jefn.S 


Let P” be the probability distribution given by the eigenvalues A j. Then, by the 
Law of Large Numbers, P” j -* 1 as n -* oo. For an arbitrary function 
/(i), j = L • • •, d and a sequence J = (y'x,..., j n ) e T n ’ S 


f(jt) + ''' + f(jn) 
n 


d 


£Ay/(y) 
7 = 1 


< 8d max /. 


(9.35) 


In particular, any strongly typical sequence is typical: taking f(j) = — log Ay pro¬ 
vides 


n[H{S) - <$i] < —log Ay < n[H(S) + <$i], 


(9.36) 
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where 8\ = 8d max^. >0 (- log Ay) (the converse is not true - not every typical se¬ 
quence is strongly typical). 

We shall need the following combinatorial result, the proof of which can be found 
in [44], 


Exercise 9.10. Show that the size of the set T n ’ S is estimated as 


2 «[H(S)-a„(S)] 



< 2 «[H(5)+A„(S)] 


(9.37) 


| where H(S ) = - Jfj = \ log and H m S^o lim„^oo 

Proof of the relation (9.34). Denote d n g = dim P n,s = 
We will prove that 


A„(<5) = 0. 




and S n ’ s 


pn.S 

dn , 8 


lim lim [s"’ S 1) = H(V[S]) (9.38) 

8-+0n-+oo ti L J 

for an arbitrary channel 4C We have 

h/ZO^S]) - H(V 9n [s M ]) = #(*[5]®") - //(ip®" [s M ]) 

= H ('I'®" (9.39) 

+ Trlog'I'IS]®" (\P®" - \P[S]®") . 


Strictly speaking, this formula is correct if the density operator 'PfS]®” is nondegen¬ 
erate, which we will assume for the moment. Later we shall show how the argument 
can be modified to the general case. 

For the first term, by the monotonicity of the relative entropy, we have the estimate 


//(\P®"[S"’ a ];^®"[S®"]) < H(S n ’ S ;S® n ), 


with the right-hand side computed explicitly as 

J2 -ll-log— 1 — = -lo E J„.j- V - 3 - log Ay, 
d "’ sXl n ’‘ 


which by (9.36), (9.37) is less than or equal to the quantity n ( 8 1 + A n (5)), converging 
to zero as n -* oo, 8 —> 0. 

By using the identity 

log 'PJS]®” = log 'PfS] <8>/<8)---<8)/ + -- - + /<8)---<8)/<8) log ^[S], 
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and introducing the operator F = 'P*[log'P[S']], where 'P* is the dual channel, we 
can rewrite the second term as 


nTr 


(P ® / 


/ + ■•• + / 


I <S> F) 


dn.S 


E 


JeT nS L 


ftil) H-+ fUn) 


7 — 1 


, (9.40) 


where f(j) — (ej \F\ej), which is evaluated by nSd max / via (9.35). This estab¬ 
lishes (9.38) in the case of a nondegenerate 'PfS], 

Coming back to the general case, let us denote by P<j, the supporting projector of 
ip[5]. In this case, the supporting projector of ^[S]®” is P®”, and the support of 
is contained in the support of 'PfS]®" = because the support 

of S n,s is contained in that of 5®”. Thus, the second term in (9.39) can be understood 
as 


Tr P®" log [p®"4i[5]®"P®"] P®" (iP®" [>> a ] - ip[S]®") , 

where now we have the logarithm of a nondegenerate operator in P®"^®". We 
can then repeat the argument, with F defined as 'P*[P<j((log Pvp^PfS] P^jP^]. This 
completes the proof of (9.34), from which (9.31) follows. □ 

9.5 Notes and references 

1. Theorem 9.1 was announced by Bennett, Shor, Smolin and Thapliyal in [24], and 
was proved in the paper [25] by the same authors. 

2. Information capacities of quantum observables were studied by Dali’Amo, D’Ari- 
ano, Sacchi [46] and by Holevo [108]. The identity (9.15) was obtained in [46] 
and (9.16) was obtained in [108]. Relation (9.14) is due to Shirokov [186] who also 
obtained a general criterion for the coincidence C ea = C x , saying that the essential 
part of the channel should be the c-q channel [187], The duality between quantum ob¬ 
servables and ensembles was introduced by Hall [74], The value (9.21) was obtained 
in the paper of Jozsa, Robb and Wootters [124] as the “subentropy” of the chaotic 
state S = I/d. 

3. We here provide a simplified proof, following [101]. Concerning strongly typical 
projectors and the solution of Exercise 9.10, see [44], 



Chapter 10 

Transmission of quantum information 


As was already stressed, a quantum state is itself a special kind of information re¬ 
source, insofar as it contains statistical uncertainty. Statistical mechanics studies ir¬ 
reversible state changes of a physical system that result from its interaction with the 
environment (and accompanied by loss of information). Information theory in a sense 
solves a reverse problem, namely that of how to reduce these losses to a negligible 
quantity by using a special processing of the system state - encoding before and de¬ 
coding after the irreversible evolution described by a noisy channel. This chapter is 
devoted to this kind of problems. 

We shall start with a brief review of the methods of encoding the quantum informa¬ 
tion resilient to a certain specific kind of errors (e.g. an arbitrary error in one arbitrary 
qubit), when complete error correction is possible. We then pass to an approximate 
setting, where some measure of fidelity for the processing of a quantum state is intro¬ 
duced. In this situation, the aim is to asymptotically achieve an exact transmission of 
the quantum state through the composite channel <J>®” with maximal possible rate as 
n -» oo. In this way, another analog of the Shannon Coding Theorem emerges, con¬ 
cerning the transmission of quantum information. This leads to the notion of the quan¬ 
tum capacity, expressed in terms of an entropic quantity called coherent information. 
On the other hand, the quantum capacity of a channel turns out to be closely related 
to its cryptographic characteristics, such as the capacity for the secret transmission of 
classical information (the private classical capacity). The role of the eavesdropper in 
such scenarios is played by the environment of the quantum system. 

10.1 Quantum error-correcting codes 

10.1.1 Error correction by repetition 

When transmitting information, one would like to have a code that is robust against 
errors. In the classical case, the existence of such codes for rates below the capacity 
follows from the Shannon Coding Theorem. However, this theorem does not provide 
a constructive method for the design of such codes, and a practical solution to this 
problem is the subject of extensive research in classical information theory. 

A straightforward method of reducing errors is repetition of messages, which, of 
course, reduces the rate of transmission. Take the binary alphabet 0,1 and assume 
that the probability p of a bit flip in the process of transmission is small, so that 
the probability of a change in more than in one bit is negligible. Consider the code 
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0 -* 00, 1 -* 11. In this case, having received 00 or 11, one can conclude with a high 
degree of certitude that the encoded symbol was 0 or, correspondingly, 1. However, 
if 01 or 10 was received, one can only ascertain an error but not recover the encoded 
symbol. However, this defect is easily remedied by adding one more bit to the code, 
0 -* 000 and 1 -» 111. The last code corrects the arbitrary error in one arbitrary bit 
of the received message. 

A direct generalization of this recipe to the quantum case is impossible, because 
quantum information cannot be copied. By the very nature of quantum information, 
one should be able to transmit robustly not only the basis states but also their superpo¬ 
sition. Although at first glance this seems not feasible, the problem has an ingenious 
solution. Let us consider an example of solving such a problem in an important model 
case. 

Assume that one wants to transmit an arbitrary pure state of a qubit i/r = a | 0) + 
b\ 1), provided that encoding by states of several qubits is allowed. The encoding 
in the quantum case is just an isometric mapping. Following the classical analogy, 
consider first the code that maps the basis vectors as follows 

|0}-+|000), I 1)-► | 111). (10.1) 

Such a code corrects the “bit flip”, i.e. the change | 0) ** | 1) in any one of the 
qubits. We are interested in the arbitrary state a\ 0) + b\ 1), which is now encoded 
into a | 000) + b | 111). Let, for example, the bit flip occur in the first qubit: 

a|0)+fc|l)-*a|100)+&|011), 

In this case, the states a \ 000) + b \ 111), a \ 100) -f- b \ Oil) are orthogonal, and hence 
can be perfectly distinguished to correct the error. 

However, such a code does not correct a “phase flip” 10) -o- | 0), | 1) -o- — | 1). 
Indeed, after such a phase error in one bit, we have a|000) — b\ 111) instead of 
a1000) + b\ 111), and in general these states are not orthogonal, and hence not per¬ 
fectly distinguishable. 

Now remark that the Hadamard transform 

|0)^-L(| 0 ) + |i», |i)^-L ( |o)-|i» 

maps a phase flip into a bit flip and vice versa. By converting the code (10.1), we 
obtain another code 

I °> ^ 0) + 1 1))(l 0) + 1 1)KI 0) + 1 1):> ’ 

I 1) ^2 (I 0) - | 1»(| 0) - I 1»(| 0) - I 1)), (10.2) 


which corrects a phase flip in one arbitrary qubit, but does not correct bit flips. 
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The Shor code, which corrects both phase and bit flips in one qubit, is obtained by 
concatenation of the codes (10.1), (10.2), and requires 9 qubits 

1 °) ^(| 000 ) + | 111 ))(| 000 ) + | 111 ))(| 000 ) + | 111 )) 

I 1 ) ^(1000) - 111DKI000) - I 111))(|000) - I 111)) (10.3) 

Remarkably, it turns out that this code corrects not only bit and phase errors but also an 
arbitrary error in one of the nine qubits, since it can be represented as a combination 
of these two basic errors. This follows from the fact that the general conditions for 
error correction discussed in Section 10.1.3 are satisfied. 

10.1.2 General formulation 

Let S be an arbitrary state in a Hilbert space M. A quantum code is an isometric 
map V : M -* Jf, transforming states S into encoded states VSV* in another 
Hilbert space AC The system A/ - is subject to errors, the effect of which is described 
by completely positive maps <1> in T(A), that belong to a certain class 8. As we 
know, irreversible evolutions are described by CP maps that satisfy the normalization 
<!>(/) ^ I (see Section 6.5), but in the present situation this restriction is not essential, 
since the effect of an error is determined up to a constant factor. 

Thus, transformations of the quantum information are given by the diagram 

S —>• VSV* —► ^(LSL*), Deg. 


Definition 10.1. The code V is correcting errors from 8, if there exists a reverse 
channel £> such that 

D[<I>(VSK*)] = c($)S, (10.4) 

for an arbitrary state S and an arbitrary $ e 8, where c(<t>) is a constant depending 
only on <J>. 

In fact, the code is determined by the subspace X = V M C A/ - and can be defined 
as such, without introducing M and V. 

Exercise 10.2. Storage of quantum information in the memory of a quantum 
computer. Let A/ - = Mf n be the quantum register in which the quantum in¬ 
formation from the subspace M is to be stored. Consider errors involving no 
more than m qubits of the register. The corresponding class 8(n,m) consists of 
the maps <!> = d>i <8> • • • <8> d> n , where the number of f Id is not greater than 
m, while the error in the /c-th qubit can be an arbitrary completely positive 
map. Operators of elementary errors in each qubit are usually given by the Pauli 
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matrices, with a x describing the bit flip, a z the phase flip, and o y = io x o z their 
combination. Together with the unit operator op = I , corresponding to absence 
of error, they form a basis in the algebra of observables of the qubit. 

The Shor code demonstrates the possibility of error-correction for the class 
8 (n , 1), if n is large enough (one can show that the smallest possible value of n 
for a code that corrects one error is 5). Restriction to the correction of only one 
error is, of course, essential here. However, it can be shown that there exist codes 
that correct errors from G(n,m), where the ratio m/n is greater than a certain 
positive number for large enough n. 

10.1.3 Necessary and sufficient conditions for error correction 

The class 8 of errors usually has the following structure. It consists of mappings of 
the form 0(5) = Jfj T/ 5 V* , where Vj e Lin(Z?i,..., B p ), and Bj are the fixed, 
linearly independent operators of elementary errors. 

Theorem 10.3 (Knill and Laflamme [133]). The following statements are equivalent: 

i. the code X corrects errors from 8; 

ii. the code X corrects the error 0[5] = h2j = \ 5/ 5 Bj; 

iii. for all 0, 0 e X such that (01 0} = 0, one has (0| B* Bj \ 0} = 0 for all 
ij = 1 

iv. for an orthonormal basis {| k )} in X one has 

(k\ B* Bj\k) — (/1 B* Bj\l), forall i,j\k,l, 
(k\B*Bj\l)=0, for k±l\ 

v. Px B* Bj Px — bij Px, where P x is the projector onto X. 

Proof 

i. => ii. is obvious. 

ii. => iii. Let there exist a reverse channel £)[5] = ffr Pr^R* for the error O. Con¬ 
sider a pure input state 5 = |0)(0|,|0)e<5C. In this case, 

r j 

But this is possible only if R r Bj \ 0 ) = cj r j 0 ), in which case c = \ cj r \ 2 . 

Let | 0 ), | 0 ) e X be two orthogonal vectors. Since the reverse channel is trace 
preserving, we have I = ]T) r R*R r . Therefore, 

( 0 | B*Bj\t) = £>| B* R*R r Bj | 0 ) = {^clrCjrm 0 ) = 0 . 
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iii. => iv. The second equality follows trivially, while the first one is obtained by let¬ 
ting I <p) = I k + /), I t/f) = I k — /). 

iv. => v. Denoting by — (k\ B* Bj \k), we can write the condition iv. as 

(k\B?Bj\l)=8 k ibij, 


which is equivalent to v. 

v. =>■ i. The matrix [by ] is Hermitian positive, hence there exists a unitary matrix 
[ujj] such that 


J2 U irbijUjs — $rsXr, 

ij 

where X r > 0. Introducing B s = Uj s Bj, we have <£[£] = Xlf=i ^ s SB* and 

P Z B*B S P £ =S rs X r P £ . 

Thus, for X r > 0 we have 

A ~ 1 / 2 B S P £ = U S P £ , 

where U s are the partially isometric operators that map X. onto mutually orthogonal 
subspaces X s C X . It follows that 

p 

<S>[S] = ^k s U s SU* (10.5) 

S= 1 


for any state S with supp S C X. 

Let P s be the projectors onto the subspaces X s , and denote Pq = I — ^2 S P s . We 
define the channel 


a[s] = E U s *P s SP s U s + PoSPo 

S 

and wish to show that it is the reverse for all errors in 8. However, a linear com¬ 
bination of elementary errors Bj is also a linear combination of B r and, taking into 
account that U* P s B r P £ = X l J 2 S sr P £ , we obtain (10.4) for arbitrary O e S. □ 

I Exercise 10.4. Check the conditions iv. for the Shor code (10.3) and the elemen¬ 
tary error operators given by the Pauli matrices in arbitrary one qubit. 
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10.1.4 Coherent information and perfect error correction 

We now consider an arbitrary channel O from A to B. Let .S' = .S^ be an input state 
of the channel. Introduce the reference system R so that Sar = War){^ar\ is a 
purification of S^. Let V : Xa <8> Xe be the Stinespring isometry (6.7) for 

the channel O, where E is the environment. 

An important component of the mutual information /(.S', <J>) is the coherent infor¬ 
mation 


I C (S,< J>) = //($[£])-//(S; <J>) 

= H(B ) - H(E) 

= H(B) - H(RB ) 

= -H(R\B). 

As we shall see later, it is closely related to the quantum capacity of the channel 4>. 

Somewhat disappointingly, the coherent information does not have several “natu¬ 
ral” information properties, such as concavity in S, subadditivity, or the second chain 
rule. Moreover, it can take negative values. Its classical analog is never positive, 
H(Y) — H(XY) = —H(X | Y) < 0. However, I C (S, <J>) is convex in <J> and satisfies 
the first chain rule: 

( 10 . 6 ) 

as can be seen from the relation I C (S, <J>) = /(.S', <J>) — H(S), and from the corre¬ 
sponding properties for the quantum mutual information I(S, d>). Let us see why one 
should not expect the second chain rule: I C (S, <J >2 ° 4>i) < /ci^i [>S], 4 ) 2)- Assuming 
that this holds, we would have 

H(<$>2 ° 4 s ![5 1 ]) — H(S, $2 ° 4>i) < //(4>2o4>i[S , ])-/f(4>i[S , ],4>2) > 

i.e. H(S, <I >2 o <!>!) > //(dJif^], <t> 2 ). But this is the same as HiEiEf) > H(E2), 
where Ej is the environment for y-th channel. However, such a monotonicity does 
not hold in general for quantum entropy, as we have noticed in Section 7.5. 

Exercise 10.5. Show that subadditivity of I C (S, O) is equivalent to the following 
inequality 

H(B l B 2 ) - H(B 0 - H(B 2 ) < //(£i£ 2 ) - H(E X ) - H(E 2 ), 
which need not hold in general. 

There is a close relationship between the perfect transmission of quantum informa¬ 
tion, error-correction and the coherent information. 
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Definition 10.6. The channel $ is called perfectly reversible on the state S = Sa, if 
there is a reverse channel ID from B to A, such that 

(<£> o <J> ® Idi{)[S,ii{] = Sar. 


Proposition 10.7. The following conditions are equivalent: 

i. the channel <t> is perfectly reversible on the state S 

ii. I{R\ E) = 0, i.e. Sre = Sr <g> Se ( where Sre = Tr rSbre) 

iii. I C (S, <J>) = H(S). 

iv. the channel <t> admits a representation (10.5) on the support of the state S 

The meaning of condition ii. is that no information goes to the environment, i.e. the 
channel is “private”. Thus, the perfect reversibility of the channel and its perfect pri¬ 
vacy are equivalent. Condition iii. means that for perfect transmission by the channel 
<J> of the state S, the coherent information I C (S, <J>) should reach its maximal possible 
value H(S). In particular, by taking S = Sa = IaMa - the chaotic state, we ob¬ 
tain the condition log = I c {Sa, $)• This indicates that the coherent information 
is the quantity relevant to the quantum capacity of the channel O, i.e. the capacity 
for perfect transmission of quantum states. In the next sections, we shall see that for 
approximately perfect transmission by the memoryless channel , an appropriate 
analog of these conditions should hold asymptotically, with n oo. 

Proof. 

i. =>■ ii. The vector \ fsRE) = (V <8> Ir)\^ar) is a vector of the pure state Srre 
of the composite system BRE. The perfect reversibility implies 

(D (8) = Sare- 

Since the partial state Sar is pure, by Corollary 3.12 Sare = Sar <g> Se- Taking the 
partial trace with respect to A, we obtain 

Sre = Sr®Se, (10.7) 


that is ii. 

ii. 4^ iii. We have 

H(S) - I C (S, <J>) = H(A) + H(E) - H(B ) 

= H(R) + H(E) - H(RE) = I(R\E) > 0; 


with equality if and only if I{R\ E) = 0, i.e. Sre = Sr® Se- 


( 10 . 8 ) 

(10.9) 
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ii. => i. The vector Were) = (V ®Ir)War) is the vector of joint pure state Srre 
of the system BRE, which is a purification of the state Sre- Starting from the right 
hand side of (10.7), we obtain another purification War) <8> Wee'), where E' is a 
purifying system for Se ■ By choice of E' we can always assume that this second 
purification has dimensionality not less than that of the first. Hence, by the remark 
after Theorem 3.10, there is an isometric operator W : Mb -* Ma <8> Me' such that 

( Ire <8> W)Wbre) = War) ® Wee 1 )- (10.10) 

Taking the partial trace with respect to E,E' of the corresponding density operator, 
we obtain the perfect reversibility 

(Id/; <g> 3) o = Sar, (10.11) 

where the reverse channel is defined as 

S)[Sb]=Tie>WS b W*. (10.12) 

i. iv. Later, we shall prove Corollary 10.14, the statement i. <$■ iii. of which, with 
the replacement of <J> by X> o <J>, implies that the channel O is perfectly reversible on 
the state S if and only if 


D o $[5] = S 

for all S, with suppS c X = supp S, where supp S is the support of the density 
operator S (see Section 1.3). We can reformulate this property by saying that $ is 
perfectly reversible on the subspace X. In other words, the subspace X is a quantum 
code that corrects the error O (and hence, all errors of the associated class 8). The 
statement then follows from the proof of Theorem 10.3. □ 

The following proposition provides yet another characterization in terms of com¬ 
plementary channels (see Section 6.6), which prepares the ground for the Coding 
Theorem for the private classical capacity of a quantum channel, in Section 10.4.1. 

Proposition 10.8. For a given Stinespring dilation of the channel <J> the following 
conditions are equivalent, 

i. the channel <t> is perfectly reversible on the subspace X 

ii. the complementary channel <1> is completely depolarizing on X, i.e. 

<!>[£] = S/s, (10.13) 

for any state S with supp S C X, where Se is a fixed state. 
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Proof. 

i. =>■ ii. Assuming O is perfectly reversible, the relation (10.10) holds, with an iso¬ 
metric W for all \i//ra) having a tensor decomposition with /I-components in L. 
Here, \ f ee') cannot depend on the input \^ra), since otherwise the right-hand side 
would be nonlinear in | ^ra}- Taking the partial trace with respect to RAE' of the 
corresponding density operators, we obtain (10.13). 

ii. =>■ i. We have a Stinespring dilation 0[,V] = Tr b VSV*, where B plays the role 
of an “environment” for the environment system. Let (10.13) hold and let \f ee’) be a 
purification of Se- In this case, there is another Stinespring dilation for the restriction 
of the channel <J> to the states with support in L: 

4>[S] = Ttae'V'SV'*, (10.14) 

where V = 1a ®\^ee') is an isometric operator from Ma to Maee' = Me ®Mae’, 
acting as V'\^r) = \f) ® \fEE’)- In this dilation, the role of an “environment” for the 
environment E is now played by AE'. By Theorem 6.12, there is a partial isometry 
W : Mb ->■ Mae' such that 

(W <g> Ie)V = V' = Ia® | fEE')- 

It follows that relation (10.10) holds for all | ^ra) having a tensor decomposition 
with 4-components in 3L. By possibly extending the space Me' we can replace W 
with an isometry that preserves relation (10.10). Following a similar argument as in 
the proof of the previous proposition, we take the partial trace of the corresponding 
density operator with respect to REE' and obtain the perfect reversibility on £: 

D o <J>[S] = S, (10.15) 

where the reverse channel is defined by (10.12). □ 

10.2 Fidelities for quantum information 

As a measure of exactness of the transmission of a quantum state S by the channel <J> 
one could take the trace norm || S — <J>[5'] || i. However, in quantum information theory 
there are a number of other useful quantities playing the same role as the probability 
of a correct decision in the classical case. Let us first consider these quantities and 
study the relations between them. 

10.2.1 Fidelities for pure states 

Lemma 10.9. Let f be a unit vector and S an arbitrary state. In this case, the 
following inequalities hold 

2[1 — (f\ S\ f)} ^ IN f)(f\ — S\\ i ^ 2^MS\T). 


( 10 . 16 ) 
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If S = \<p) (<p\ is a pure state, the second inequality turns into equality, i.e. 

\\\f)(f\ -\4>)(4>\ II1 =2^1-1 (f\<f>)\ 2 . (10.17) 

Proof. By using (1.19) we have 

III f)(f\ - S|| t = max |TV (ItAXiAI - S)U\, 

where the maximum is over all unitary operators U. Taking U = 2\f){f\ — I 
produces the first inequality. 

The equality (10.17) is obtained similar to relation (2.44). To compute the trace 
norm, it is sufficient to find the eigenvalues of the operator |t/f)(t/f | — \ f) (f\, which 
has rank 2 (see Proposition 2.26). 

Now, let S be an arbitrary density operator and consider its spectral decomposition 
S = $i ■ By using the convexity of the norm, the concavity of the square root, 
and relation (10.17), we obtain 

III VOW - S\\ 1 < ]pA ; ||| f)(f\ - Sill 1 

i 

= lY J hs/\-mSi\f) <2s/\-(f\S\f). 


□ 


For a pure state | f)(f\ and an arbitrary state S, the quantity 

^’(IV f )( 1 HS) = (^1^1 VO (10.18) 

is called the fidelity between the pure state |i/r)(i/r| and the state S. Clearly, F < 1, 
with equality if and only if S = \f)(f\. Note that in quantum data compression 
(Section 5.5) we in fact used the fidelity (10.18). There, the state | ft) {fi | appeared 
with probability pi . Hence, the average fidelity is just F = Pi (Vh' I Si | i/q). 

In connection with error-correcting codes, a natural definition of fidelity is as fol¬ 
lows. Let a subspace (quantum code) 2d and a channel <t> in M, for which the 
input and output spaces coincide Mb = Ma = M, be given. The subspace fidelity is 
defined as 


F S (£,<Z>)= min F(\f)(f |, <*>[|^)(l/ f l]l) 

V'-eXJV'-1! =1 

= VOW II f)- 

V'-eXJIV'- II =1 

By Lemma 10.9, 

2[1 — F s (£, <£)] < ma* |||^)(^|-4>[|^)(^|]||i <2/1^7^$). 

||^ll = l 

(10.19) 
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Another important quantity is the entanglement fidelity F e (S, O), defined as fol¬ 
lows. Let us purify the state S = .S^ to j ^ar){^ar\ in the Hilbert space Xa <8> Xr 
and consider the fidelity for this pure state under the action of the trivial extension of 
the channel 


F e (S, <J>) = (^ar\ ($ <8> Idfl)(l tyAR){tyAR\ )l tyAR)- 

There is another convenient expression for F e (S, <t>), which also shows independence 
of this definition of the way of purification of the initial state. 

Lemma 10.10. Let the channel have the Kraus representation 

Q[S] = J2 V i SV *’ ( 10 . 20 ) 

i 


then 

Fe(S,$) = E |TrXS| 2 . (10.21) 

i 

Proof. Indeed, 

(fAR\Vi <g> lR)\fAR.){fAR\(Vi <8> lR)*\fAR.) = ^2 I (fAR\(Vi <8> Ir)\^Ar)\ 2 
i i 

= ^|TVX^| 2 . n 

i 

Notice that for a pure state S = 

F e (S,< J>) = (V f l‘I ) [lV f )(V f l]lV f ) = 


I Exercise 10.11. Prove that F e (S, O) is a convex function of S. Hint: By (10.21), 
F e (S , O) is the sum of squares of affine functions of S. 

10.2.2 Relations between the fidelity measures 
Lemma 10.12. For an arbitrary state S 

1 — F e (S, <J>) < AyJ\ — Fj(supp S, O). 


Proof. We have 

1 - F e (S, 3>) = 1 - (fAR\ ($ <8> Id/;)(| fAR)(fAR \)| fAR.) 

= {^AR\((idA - $) <8> Id/;)[j ^Ar){^Ar\]\ iAR )• 
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Representing | i/Mft) = I tyj ) <8> | ej), where {| e 7 )} is an orthonormal basis in Mr, 
Hlj II V 2 / || 2 = 1, V 2 / € supp S, we obtain the equivalent relation 

'Yjftj I( Id - [I V 2 /) (V 2 * I]I V 2 *) = XI Tr IV 2 *) (V 2 / I (I fj ) (fk I - $>[1 tj) {fk |]) • 

jk jk 


Taking into account inequality (1.18) and the fact that the operators | tyj ) (Vot I all have 
both trace and operator norms equal to || 1 / 71 | || t/ot II > we obtain that this does not exceed 


r-smil 

II T 


where the maximum is taken over all nonzero operators T acting in X = supp S. By 
decomposing T = T\ + iT^, where T* = T\, = ?2 again act in d£, and taking 

into account that ||7 ’i j 2||i < || T || 1 , as well as the triangle inequality, we infer that this 
does not exceed the following quantity 


II t - <j>miii 

2 PSr uni, = 2l|s ■ 4,1x1111 ■ <10 - 22) 


Here, the inequality > is obtained by restricting ourselves to positive T. On the other 
hand. 


max 

T*=T 


iir-sra 

IIT’Hi 


< 


where we denoted p± = 
or equal to the right hand 


\\s+ ~ $[S+]||i . + \\s- - <£[£-] Hi P ] 

(10.23) 

Tr T±; S± = p± l T±. Here, the right hand side is less than 
side in the relation (10.22), which is thus proved. Finally, 


max || S — 

5:supp SeX. 


<J> E*S']111 = max III ir)(f\ ~ $[| ^)(^|]||i, 


since the maximum of a convex function is attained at an extreme point of the convex 
set. By using the second inequality from (10.19), we obtain that the expression does 
not exceed 4-y/l — F s (£, O). □ 

Exercise 10.13. By using the convexity property from Exercise 10.11, prove the 
inequality 

1 - F S (X, <J>) < 1 - min F e (S, <D). (10.24) 

5:supp S C.X 

Now, let X = M. In this case, 

\\T-$[T]\\i 


max 


- ||Id — <J>||. 
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From the proof of Lemma 10.12, we have the chain of inequalities 

1 — min F e (S, <J>) < || Id — <J>|| < 2 max ||| | — $[1 V r ) (V r IIII i — 2||Id — <J>||. 

S 11^11 = 1 

(10.25) 

Together with relation (10.24) this means that a deviation of the channel O from the 
ideal channel Id is equivalently described by one of the quantities 

||Id — max US'-$[5]||i; 1-F,(#,$); 1 - min F e (S, <f>). 

s s 

In the consideration of the quantum capacity we will need the following obvious 
corollary of Lemma 10.12: 

1 - F e (S, <J>) < 4-y/l - F S (M, <J>), (10.26) 


where S is the chaotic state in J(. 

It is instructive to consider the “ideal” case of the unit fidelities. 

Corollary 10.14. The following conditions are equivalent, 

i. F e (S, <J>) = 1 

ii. Fy(supp S, <J>) = 1 

iii. <t>[S] = S for arbitrary state S with supp S C supp S 

iv. F e (S , <J>) = 1 for arbitrary state S with supp S c supp S 

Proof Let us prove i. =>■ ii. Let \ f) e supp S. In this case, we can write 

S = p\f)(f\ + (l-p)S>, 

where p > 0 and S' is a density operator. By Exercise 10.11, the function S —>■ 
F e (S, <J>) is convex. Therefore, 

pF e M)W\,$) + (l-p)F e (S',<l>) > F e (^,<D) = 1, 

and hence F e (|i/r)(V f |, $) = 1. But this means that 

(f\ $[| f)(f\ ]| f) = 1 

for all \\f) e supp S and F s (supp S, O) = 1. 

Lemma 10.12 implies ii. =>■ i. Moreover, since ii. obviously implies the same con¬ 
dition, with supp S instead of supp S , it also implies iv., which is formally stronger 
than i. 

By the definition of F s , ii. is equivalent to iii. for pure, and, hence, for mixed 
states S. □ 
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10.2.3 Fidelity and the Bures distance 

Here, we discuss the fidelity between two arbitrary mixed states. This material will 
be needed only in Section 10.4.4. 

For pure states S = \i>)(<p\,T — | it){\jt\, the fidelity is equal to F(T,S) = 
\(<p\it)\ 2 , and formula (10.17) provides the relation between the fidelity and the trace 
norm distance between S, T. Now, let .S', T be arbitrary density operators. 

Definition 10.15. The fidelity between die density operators S, T is defined by the 
relation 

F(T, S) = m a x\(f T \irs)\ 2 , (10.27) 

where the maximum is taken over all possible purifications its, itr of the states S, T 
(in the same Hilbert space M <8> M'.) 


By using Theorem 3.11, we see that 

F(T,S) = max | (^1(7 ® W)f s )\ 2 , (10.28) 

w 


where now fis, itr are fixed purifications of the states S,T in a fixed Hilbert space 
M <S> X' , and the maximum is taken over all unitary operators W in M' . In particular, 
taking the purifications y/S, yff in L 2 (M) ~ M <g> M* we obtain 


F(T, S) = max 
w 


Tr yfSy/TW 


where the maximum is over all unitary operators W in JC. By using the polar de¬ 
composition of the operator y/Sy/T = U\y/Sy/T\, one sees that the maximum is 
achieved for IF = U * and is equal to 


F(T,S) = (Trl^x/Tiy 


yfSyff 


2 

1 ’ 


This coincides with (10.18) for T = \ir)(ir\. Note that the above argument implies 
that in definition (10.27) we can restrict ourselves to maximization with respect to all 
purifications of only one of the states, with a fixed purification of another. 


| Exercise 10.16. Show that F(T, S ) < 1, with equality if and only if T = S. 


Related to the fidelity (10.27), there is a metric on the set of all states, called the 
Bures distance, namely 


P(T,S) = min || f T - ^||, 


(10.29) 


where the minimum is again taken over all possible purifications fis, itr °f the states 
S, T. Taking the square of the norm and using the definition (10.27), one obtains 


P(T, s ) = J 2(1- y/F(T, s)j = J 2(1- Vsx/rIJ. 


(10.30) 
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In particular, for pure states T = |, S = \<p){<p\ 

P(T,S) = ^2(1 - \(if\<p)\). 

It also follows that, just as in the definition of the fidelity, in definition (10.29) we can 
restrict ourselves to minimization with respect to all purifications of only one of the 
states, with a fixed purification of another. 

The triangle inequality for f(T, S ) then follows from the definition (10.29) and the 
corresponding inequality for the norm. For pure states, it reduces to 

V 1 - \(<p\f)\ < Vi-I(?lx)l + Vi-Kxivoi, (10.31) 

where (p , /, \f are arbitrary unit vectors. 

Lemma 10.17. For any two density operators Si, S 2 in X 

P(Sx,S 2 f< HSj —S 2 II 1 < S 2 ), (10.32) 

The first inequality, combined with (10.30) and (10.28), implies 

Corollary 10.18. For any two density operators S\,S 2 in X and their given purifi¬ 
cations \jr$\ 1 fs 2 i n X ® X', 

>2(l-max\{f Sl \(I ^W)xfs 2 )\), (10.33) 

v W / 

where the maximum is taken over all unitary operators W in X'. 

Proof of Lemma 10.17. Since Tr < Tr | VSj" *J~S 2 |, to prove the first in¬ 

equality in (10.32) it is sufficient to prove that 

2(l-TVys7v^) < II S 2 1|! . 

This in turn follows from the more general inequality 

iv (ys7-y^) 2 < 11 S 1 -S 211 L 

valid for arbitrary positive S], S 2 . To prove this, for a Hermitian operator K, we de¬ 
note by 1+(AT) (resp. \-{K)) the spectral projector corresponding to positive (resp. 
non-positive) eigenvalues. In this case, a(K) = \ + (K) — \~(K) is a unitary Her¬ 
mitian operator that satisfies o(K)K = Ko(K ) — |AT|. Denoting K = v^T — 
2 /S 2 , L = 2 /S\ + we have S\ — S 2 = \{KL + LK) = K o L. Hence, 

||Si — S 2 1|j > \Tto(K)KoL\ = Tt\K\L 

= TV \K\ (l+(K)Ll+(K) + . 
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Since L > K > -L, then l+(^)Ll+(ii:) > M^)^ and \-{K)L\-(K) > 
-1 -{K)K. Thus, (l + (jqLl+(Ji:) + l_(^)Ll-(ii:)) > |^|, and therefore 

||5i - S 2 ||i > Tr |^| 2 - TV (7s7-. 

To prove the second inequality in (10.32), let Si = T*T\, S 2 = for some 
not necessarily Hermitian T\, T 2 . Denoting K = T\ — T 2 , L = T\ + T 2 , we have 


||Si -S 2 ||! =TVa(Sj -S 2 )(Si -S 2 ) 

= ^ [Ttct(Si - S 2 )K*L + Tra(Si - S 2 )L*^], 

By using the Cauchy-Schwarz inequality for the trace, we obtain 

| TV <t(Si - S 2 )K*L\< [Trtr(Si - S 2 ) 2 K*K • TV L*Lf = [TV K* K ■ TV L*L ]^; 

and similarly for the second term. Thus, || Si — S 2 ||j < [TV K*K ■ TV L*L] 5 . Using 
the normalization of Si, S 2 , we obtain 


TrK*K = 2(1 -3f(TV7YT 2 ), T \ L*L = 2 (l + SfTV T?T 2 ). 


Take T\ = ■sfSl, T 2 = U s/S 2 , where U is the unitary from the polar decomposition 
of ysrVSz, then SfTV T*T 2 = SfTr T*T\ = TV | v^TVS^I ■ Thus, 


TV K*K = 2 (l - I J = P(Si, S 2 ) 2 , 

Tr L*L = 2 (l + I 7s7v/sl| J < 4, 


and the second inequality in (10.32) follows. 


□ 


10.3 The quantum capacity 

10.3.1 Achievable rates 

Let : T^(Ma) -»■ T:{Mb) be a quantum channel. Consider the composite channel 
O®" : Tr(M® n ) -* ^(J(® n ). 

Definition 10.19. A number R > 0 is called the achievable rate (for the transmission 
of quantum information), if there exist a sequence of spaces such that 

lim -log dimje*-"' = R, 
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and sequences of channels 8^ : T:(X^) —> Tr(X® n ) (encodings) and 3)^ : 
T:(Xf n ) —> T:(X (n ^) (decodings) such that 

lim F s (X {n) , 3) (n) o d)®” o g<”>) = 1. (10.34) 

n —>*oo 

The least upper bound of achievable rates will be denoted 2(0) and called the 
quantum capacity of the channel 0. 

The following data-processing inequality is useful for obtaining bounds on the 
quantum capacity. 

Proposition 10.20. For any two channels d>i, 4> 2 

2(<D 2 o Oj) < min{2(4>i), 2(0 2 )}. (10.35) 

In particular, if one of these channels has zero quantum capacity, so does their con¬ 
catenation. 


Proof To show that 2 ( 0 2 o O,) < Q((S> 2 ), it suffices to observe that 2(0 2 0 Oj) 
is equal to the supremum of the achievable rates for channel 0 2 over the special class 
of encodings, including postprocessing with channel Oi. Hence, it does not exceed 
2(0 2 ). The inequality 2(02 0 Oi) < 2(Oi) is proved similarly by considering 
decodings for Oi, including preprocessing with channel 0 2 . □ 


Note that essentially the same reasoning implies a similar inequality for the classi¬ 
cal capacity, 

C(0 2 o Oi) < min{C(Oi), C(0 2 )}, (10.36) 

while for the classical entanglement-assisted capacity C ea this can be derived from the 
data-processing inequality for quantum mutual information (Proposition 7.31, prop¬ 
erty iv), taking into account expression (9.4). 

In the proof of the classical Coding Theorem it was much more convenient to use 
the average error probability instead of the maximal error probability, as the two cri¬ 
teria were shown to be equivalent in Lemma 4.11. In the quantum case, the role of 
the average error is played by the quantity 1 — F e 3> (n ^ ° 4>® n o gWj, where 

S<"> = is the chaotic state in X^ n \ while the analog of the maximal error is 

1 — F S (x (n) , £> (n) o <D®" o gW) . Indeed, F s o <D®” o gW) -* 1 im¬ 

plies F e 3)^ o <D® n o —> 1 (according to Lemma 10.12). In the reverse 
direction, there is the following noncommutative analog of Lemma 4.11: 


Lemma 10.21. Let X be a Hilbert space of dimensionality 2d and a channel in 
X. There exists a subspace X' of dimensionality d such that 

1 - F s (X', D'o$)< 2(1 - F e (S, $)), 
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where S = ^I x , and £>' [5] = P'SP' + ^Tr (/ - P')S for all S e <5(M) with 
P' being the projector onto 3i'. 


Proof. On the unit ball of M, consider the continuous function \jr) -»■ f(jr) = 
| O [|V f )(V f |] \f) = Fe (IV f )(V f l- $) • Let | l) be a vector minimizing f(xf). Define 
the orthonormal basis {\fj)',j = 1,..., 2d } in M = Jfo by the following recurrent 

procedure. \fj + \) is the vector in the subspace Mj = {\\fi},... ,\fj)} ± that mini¬ 
mizes / (f). Then Mj D Mj +i and dim Mj =2 d — j. By the convexity of F e (S, 0) 
(see Exercise 10.11), we obtain 


1 - Fe (S, S) > ^ £ (1 - Fe (I fj)(fj\, *)) 
;=i 

i 2d 

-^d 

j=d +1 

>^fl- min /(t/f)) 

2 V I xlr)eX d / 


> - (1 - F, (<T, 5)' O $)), 


where Jf' = Mj. 


□ 


From this lemma, together with inequality (10.26), it follows that in the definition 
of achievable rates we can replace (10.34) by 

^lirn^ F e (S (n) , £> (n) o <p®" o g(">) = 1. (10.37) 

Corollary 10 .22. Any PPT channel (in particular, an entanglement-breaking chan¬ 
nel) has zero quantum capacity. 


Proof. In this proof, we denote by T the operations of transposition in different 
spaces, which will be clear from the context. By using (6.31), we have 

F e (S M , £> (n) o cp®" o g (n) ) = F e (S (n) , £> (n) oT oTo <p® n o g (n) ) 

= F e (S M ,T o$„), 

where <P„ = j o {T o O®") o 8 (n ^ is a channel, because by Proposition 6.27 
T o <p®" is a channel. Now, 

F e (S M , T o $ n ) = (QW\ (Id**, ®7o$„) 

= TY|£2 (n) )(£2 (n) | (ld* ( „> ®7o$„) 

= TV (Id^ ( „, ® T) [|S2 (n) )(£2 (n) |] (Id#™ ® 9„) [|^< n) )(£2 (n) |], 
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where 

i^ (n) > = 4=Ei ffl )® i m > (10 - 38 > 

^ d n m = i 

is the maximally entangled vector in <8> 3i 

Introducing the Kraus decomposition <F„ [5] = VkSV£, we obtain, by us¬ 
ing (10.38), that the last expression is evaluated as 

JiE E HniDMVt-msJiE E imui /> i 2 = 

n k l,m=l n k l,m=l n k 

_ 1 
d n 

Thus, for arbitrary encodings and decodings F e (S (n \ <£)(") o O®" o G^ n> ) = d~ x < 
1/2 implying 2(0) = 0. □ 

Now, let us show that in the definition of the quantum capacity we can restrict 
ourselves to isometric encodings, i.e. to those which have the form 

G[S] = AdV[S] = VSV*, (10.39) 

where V is an isometric mapping of the space into the input space 3i® n of the 
channel. We shall prove that if there is an encoding of a general form which achieves 
high fidelity for a given state, there also is an isometric encoding with a similar prop¬ 
erty. To explain this on an intuitive level, consider the case of perfect transmission. 
If the concatenation encoding-channel-decoding perfectly reproduces some state S, 
the encoding channel G is perfectly reversible on S, with the concatenation channel¬ 
decoding as the reverse channel. In this case, according to Proposition 10.7 the chan¬ 
nel G on supp S can be represented in the form (10.5) as a convex combination of 
isometric encodings Uj = Ad Uj. Hence, any channel Uj is perfectly reversible on 
S with the same reverse channel. 

Lemma 10.23. Let S be a state in a Hilbert space <56, <56 = supp S, let G be a 
completely positive map from ^(<56) to (76), such that Tr 6? [S] = 1, and let A be 
a channel from (Jf) to '?'(<56). Assume that dim <56 < dim 3t. Then there exists an 
isometry V from <56 to M such that 

Fe(S, A O AdP) > F e (S, A ° G) 2 . (10.40) 

This lemma will be used in the situation where G is the encoding map, while A is 
the concatenation channel-decoding. For the proof, we need the following generaliza¬ 
tion of the polar decomposition. If X is an operator from <56 to J6, dim <56 < dim J6, 
then X = V\X\, where |A| = -Jx*X is a positive operator in <56 and V is an isom¬ 
etry from <56 to M. This implies the following representation for an arbitrary square 
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matrix X : X = VDU , where D is a diagonal matrix with nonnegative elements, and 
V, U are unitary matrices. 

Proof. Let A, : M ->■ X and Ej : X -»■ M be the components of the Kraus decompo¬ 
sition for the maps A and 8, respectively. Denote by X the matrix with the elements 
Xij = Tr Ai Ej S. Then, by (10.21), 

F e (S,Ao6) = £| X t j\ 2 . 
ij 

Complementing the Kraus decompositions, if necessary, with additional zero compo¬ 
nents, we may assume that X is a square matrix. The decomposition X = VDU 
implies that by transforming the Kraus decompositions for A and 8, we can make 
the matrix X diagonal. Then again using (10.21), we obtain F e (S,A o 8) = 
'ffk |Tr A^E^Sl 2 . Denote A^ = Tr SE ££)., and restrict ourselves to A^ > 0. Then 
Afc(|Tr AkEkS\ 2 /Xfr) = F e (S, A o 8) and A^ = 1, so that there exists a 
k such that |Tr Afc.EfcS| 2 /Afc > F e (S,Ao8). Setting E = E^/^fX^ : X —> M, 
A = A k : M -* £, we have A* A < I x , r TxSE*E = 1 and 

F e (S,A°8) < E e (S,AdAoAd£) = ITr AES I 2 . 

Let A* = V | A* | be the polar decomposition of the operator A* : X —> M, where 
V : <56 -> X is an isometry. According to the noncommutative Cauchy-Schwarz 
inequality (2.24) 

|Tr AES \ 2 = |Tr SAE \ 2 < Tr SAA*Tr SE*E = Tr 5"|^4* | 2 . (10.41) 

Since A* A < I x , also A A* < 1%. Hence, \A *\ 2 < \A*\ and therefore 
TrS|A*| 2 < Tr S|A*| = Tr ATS. 


Then 


|TrATS| 2 < J^ITrA^TSI 2 = F e (S, A o AdT). 
k 

This chain of inequalities implies (10.40). □ 

Now, let some sequences of subspaces J(( n \ encodings 8 (n \ and decodings 
be given, for which (10.37) holds. In this case, by Lemma 10.23, there exists a se¬ 
quence of isometric encodings 'V^ = AdT^ such that 

lim F e (S M , £> (n) o O®" o T (n) ) = 1. (10.42) 

n—^oo 


It follows that in the definition of the quantum capacity we can restrict ourselves to 
the isometric encodings. 
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10.3.2 The quantum capacity and the coherent information 

The following fundamental result is the Coding Theorem for the quantum capacity, 
which relates it to the coherent information. 

Theorem 10.24 (Lloyd; Shor; Devetak). For any channel O, the quantum capacity is 
given by the expression 


2(0) = lim -max 7 C (S, 0®"), (10.43) 

n—KX> n S 

where I C (S, O) is the coherent information. 

Existence of the limit can be shown, just like in the case of the classical capacity, 
by using superadditivity of the sequence Q n = rnaxs I C (S, O®"). Let us check 
that (10.43) implies the inequality relating the quantum and the classical capacities of 
the channel 

2(0) < C(O), (10.44) 

which is proved the same way as the second inequality in (9.8). Indeed, take S to be 
a mixture of arbitrary pure states Sj with probabilities pj. In this case, (7.50) implies 

Ic{S,$) < H(£p,$[S/]) -X>ff(*[S/]) = x({Pj},mSj}})- 

v j ^ j 

Taking the maximum provides max 5 I C (S, 0) < C x (O). Applying this to O®" leads 
to (10.44). 

Proof of the inequality <. Denote by 2(0) the right-hand side of (10.43). A simpler 
part of the theorem is the proof of the inequality 

2(0) < 2(0), (10.45) 

i.e. the weak converse of the Coding Theorem. Let be the input space of di¬ 
mensionality d n = dim = 2 nR and let 8 (n \ £) (n ^ be an encoding and decoding 
such that 

1 - F e (S (n \ £> (n) o 0® n o gW) < e, 

where S^ = ±I X (n) is the chaotic state in J(^ n \ with the entropy H(S^) = 
log d n — nR. Here, we are using the requirement (10.37) in the definition of the 
achievable rate. According to Lemma 10.23, we can assume that 8^ are isometric 
encodings. Since 

F e (S (n) ,£> (n) o O®" o 8 (n) ) 

= (£2 (n) | (ld^<„, ® £> (n) o O® 1 " o gW) [|i2 (n) )(i2 (n) |] |f2 (n) ), 
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where \Q^) is the maximally entangled vector, it follows by Lemma 10.9 that for 
the maximally entangled state in ® the following holds, 

||£2 (n) )(£2 (n) | _ g, 5)(») o <p®" o gW) [|nW)(nW|]|| < 2yfe. (10.46) 

This means that the rate R is achievable for asymptotically perfect transmission of the 
chaotic state in . 

Denoting S (n ^ — 8 (n ^ and using the chain rule (10.6) for the coherent in¬ 

formation, we have 

I c (S (n) , O®") > I c (S (n) , £> (n) o 0®”) = H(Sb') - H(Sb'r'), (10.47) 

where S b >r> = (ld^< B) ® (£> (n) o 0® n o gW)) [|fi (, * ) )(fi< ,,) |], so that 

S B ’ = (£> (n) o 0® n o g(»>) [5 (n) ] . 

The equality H(Sb'r') = H(S^ n \ o d>®") follows from the fact that, due to 
isometric nature of the encoding 8^ n \ the state ^Id^<«) <8> 8^ n ^ (£2^|J is 

pure and can be regarded as a purification of W According to (10.46) 

< ll^ (n) )(^ ( " ) |-5^^|| 1 <2Vi. 

Here the first inequality follows from a simple observation, to be proved in the fol¬ 
lowing exercise. 

I Exercise 10.25. Prove the following statement: the trace norm does not increase 
under partial trace, i.e. 


Il'iwll! < mit. 

for any positive operator T e M ® Mq. Hint: Use expression (1.19) for the trace 
norm. 

By twice using the estimate (7.26) for the continuity of the entropy, we have 


H(Sb') - H(Sb'R') = H(S^) + [ H(S B >) ~ H{S (n) )\ 

+ [H{\Q.M)(Q.M\)-H(Sb'R')\ 
H(S W ) - 6 log dim (n) Vi - 

e 

21oge 


> 


= nR (l — 6\/i) — 


e 
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Therefore, taking into account (10.47), 

Q n = max I c (S^ n \ O®") > nR (l - 6^/e) - 21 ° g —, 

S<") e 

whence R < lim„^oo ^Qn = Q(®)- 

This completes the proof of inequality (10.45), i.e. the Weak Converse for the 
quantum capacity. The proof of the Direct Coding Theorem is postponed until Sec¬ 
tion 10.4.4. □ 


10.3.3 Degradable channels 

First of all, we obtain important relations between the coherent information and the 
quantity 

X «7i x }, {5,}) = H(J2 n x S x ) - J2 *xH(S x ), 

X X 

giving the upper bound (5.16) for the amount of classical information and, as a conse¬ 
quence, a relation between the quantum and the classical capacities of a channel and 
its complementary. 

Consider a channel : T^(M A ) -»■ its Stinespring dilation with the isom¬ 

etry V : M A -»■ Mb ® Me , and the complementary channel (see Section 6.6) 

Q[S] = Tr Xa VSV*; SeT:(M A ). (10.48) 

Let S — Y^ x 71 x S x be an arbitrary decomposition of the state S into pure states. In 
this case, the following equalities hold. 


X 

- //(*[$])-(*[$*]) 

X 

= x {{n x }, {«K[Sx]» - X (tec}, ms x ]}) 

= |> W*]; *[S]) ~ H(*[S X ]; ®[S])]. 


(10.49) 


(10.50) 

(10.51) 


Here, equality (10.49) follows from the fact that the state VS X V* is pure and hence 


tf(d>[S*]) = H(<t>[S x ]) 


for all x, and (10.51) follows from identity (7.19). Taking the upper and the lower 
bounds in (10.50), with respect to all possible ensembles of pure states, we obtain 


-0i(4O<Cjr(S)-C;t(<IO< 0i($), 


( 10 . 52 ) 
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where we introduced the notation 

2i(0) = max/ c (5, O). (10.53) 

s 

Applying (10.52) to channel O®", taking the limit n —>■ oo, and using the Coding 
Theorems, we obtain, 


-Qm < C(O) - C(O) < 2(0). 



Figure 10.1. Degradable channel. 


Now, we introduce an important class of channels for which the quantum capacity 
is given by the “one-letter” expression (10.53). 

Definition 10.26. The channel O is called degradable if there exists a channel T such 
that O = T o O, and anti-degradable if there exists a channel T' such that O = T' o O. 

Obviously, O is degradable if and only if O is anti-degradable. From relation (10.51), 
by using the monotonicity of the relative entropy, we deduce that if O is an anti- 
degradable channel, then 


I C (S, O) < 0 for all states S (10.54) 

(correspondingly, I C (S, O) > 0 fora degradable channel). 

Proposition 10.27. If <t> is an anti-degradable channel, then 2( c h) =0. If O is 
degradable, then 

0(<&)=0i(S) = max/ c (S,<&). (10.55) 

Proof. The first statement follows from (10.54) and the Weak Converse (10.45). 

Let O be a degradable channel and W : Mb —>■ Xe ® Xe* be the Stinespring 
isometry for the channel T : T:(Xb) -> Ti(Xe). Then I c ( S , d>) = H(B)—H(E) = 
H(EE') - H(E), so that 


7 C (5,0) = H(E'\E). 
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For two degradable channels 4>i, <I >2 and for an arbitrary state S '12 in Xa x ® Xa 2 , we 
have 

Ic(Si 2 ,*i®*2) = H(E' 1 E' 2 \EiE 2 ), 

Hence, by subadditivity (7.43) of the conditional entropy 

h (5l2, $1 <8> $ 2 ) < h (Si, $ 1 ) + h (S 2 , $ 2 )» 

implying 

/ c (S W ,$ 8,! )=nmax/ c (S,$) (10.56) 

s 

and hence (10.55). □ 

Consider an entanglement-breaking channel 

d 

$[S] = Yh IwtXVotlsiVotX^I, 

k= 1 

where {t/f^} is an overcomplete system in X, and {(pp} is a system of unit vectors in 
the output space X'. The complementary channel is 

d 

3>[S] = Y! c kl\ e k){fk\S\fi){ei\, 
k,l= 1 

where cpi = {<pi\<pg), while {ep} is the standard basis in C d . We have <t> = r' o 4>, 
where 

d 

r'[S] = \<Pk){e k \S\e k )(<p k \. 

k= 1 

This proves 

Corollary 10.28. Any entanglement-breaking channel $ is anti-degradable. 

The first statement of Proposition 10.27 again implies Q(^>) = 0, thus reconfirming 
Corollary 10.22. 


Example 10.29. Consider the quantum erasure channels <F /) . For q > p, 

% = r (t-?)/(t-/>) 0<I) /» 

where T a : TP(X © C) —>• 1^(X © C) is the channel acting as 


r rr 5 °n _ i s °i,j° 

r “ 11° s \\ a L° s \ +1 a L° 


0 

Tr S + s 
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Combining this with the fact that <& p = <£ 1 -/, (see Exercise 6.35), we find that <& p 
is degradable for p e [0,1/2] and anti-degradable for p € [1/2,1], Therefore, by 
Theorem 10.27, we obtain 


ew = 


(1 - 2p) log d, 

0 , 


P e [0,1/2]; 

P£ [ 1 / 2 , 1 ]. 


(10.57) 


Here, the first line follows from (10.55) together with the fact that //(d^fS]) = (1 — 
p)H(S) + h 2 (p). Hence, I C {S, Q p ) = H(<S> P [S]) - H@ P [S]) = (1 - p)H(S) ~ 
pH(S ) = (1 — 2 p)H(S). Maximizing over S gives (10.57). 


10.4 The private classical capacity and the quantum 
capacity 


10.4.1 The quantum wiretap channel 

Consider the situation where the quantum communication channel is accessed by three 
parties, a sender A, a receiver B and an eavesdropper E. The mathematical model of 
the quantum wiretap channel consists of three Hilbert spaces Xa, Xb, Xe and an 
isometric map V : Xa ->• Xb ® Xe , so that the input state S 4 is mapped to the state 
Sbe — ®be [5T] = VSaV* of the system BE, with the partial states 

S B = [Sa] = Tr E VS A V*, S e = $>E [Sa] = T r B VS A V*. 


Recall that the notation E was used previously for the environment. This is a consis¬ 
tent change, if we regard the whole environment as accessible to the eavesdropper. 

Assume that the party A chooses an ensemble of states {S^} with probabilities 
{tt x }, so that B and E receive states and {S^}, respectively. In this case, 
the upper bounds for the Shannon informations of B and E are, correspondingly, 
/ ({ 7 r x }, {Sg}) and / By analogy with the classical wiretap chan¬ 

nel, see Section 4.5, the “quantum privacy” can be characterized by the quantity 
X {{ n x}, {5g}) — x {{tt x }, {^e I) • Assuming that the input states are pure and de¬ 
noting by Sa = tt x S% the average of the input ensemble, we obtain, from (10.50), 
a key relation (Schumacher and Westmoreland [178]), 

Ic (Sa, Ob) = / (Or*}. {5|}) - * ({tt x }, {5|}). (10.58) 


This points to the profound connection between the quantum capacity and the private 
classical capacity, which provides a hint to a proof of the Direct Coding Theorem. 

To define the private classical capacity, let us consider block coding for the quantum 
wiretap channel. The goal is to transmit the maximum amount of classical information 
from A to B in a private way, so that E can receive only an asymptotically negligible 


amount of information. A code 


of length 


n and of size N is defined as 
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in Definition 8.1, i.e. as a collection of states E^ = {S l A(n) ',i = 1,..., /V} in , 
together with an observable M (n ^ — {My; j = 0,1,..., A} in 3t E n . Along with its 
error probability defined by (8.1), we consider the new quantity 

M s&,) ) = , k ^ N IlSjjo,, 

which characterizes the variability, and hence the information content of the output 
states for the eavesdropper E. Notice that, by the triangle inequality for the norm, we 
have 

S E (n) - S' Ein) 1 < v E for all i, 

wher eS E (n) = jj XlfLi S l E(n) . Applying the inequality (7.25), we obtain 

Ml si • K«'}) = ^ E [" &“>) - H (4»)] V0.59) 

< log dim X E ■ v E (e (,!) ) + % (v E ( s( " } )) , 

provided that is small enough. Therefore, if the variability is small, the 

Shannon mutual information between A and E, with equiprobable encoding, is also 
small. 

We call R the achievable rate for the wiretap channel if there exists a sequence of 
codes (E^, M^")) of sizes N = 2 nR , such that 

lim P e (Y, M ,M ( ">) = 0 
>-00 V ' 

and 

lim U£’(E ( " ) ) = 0. (10.60) 

n—>oo v 7 

The least upper bound of the achievable rates is called the private classical capacity 
C p (®be) of the wiretap channel. By (10.59), condition (10.60) implies asymptotic 
vanishing of the mutual information between A and E. It is possible to show that 
these conditions are, in fact, equivalent, but condition (10.60) is more convenient 
technically and also is more useful for later application to the quantum capacity. 

Theorem 10.30 (Devetak [51]; Cai, Winter and Yeung [34]). 

c, (*«!) = toa, 1 ^ ma*,, [x (t*?’}. <siw>) - X (W’r <4»>)] • 

(10.61) 

where the maximization is over all ensembles, i.e. finite collections of states E^ = 
in w,t ^ probability distributions t = {ttj n ^} (we use the notations 
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The formulas (10.43), (10.58), and (10.61) imply an important relation between the 
quantum and the private classical capacities 

Q{®b)<C p {$>be) (10.62) 

following from the fact that C p (<S>be) involves arbitrary state ensembles for A, while 
Q (4>b) involves only pure ones. In general, the inequality can be strict. Therefore, 
the following statement is of special interest. 

Proposition 10.31. If the channel <t>£ is degradable, 

C P {<S> BE ) = Q (<&*) = Qi («5b) • (10.63) 

Proof By using equality (10.50), we have 

= *(E*i n)s B<">) - (E^U) 

^ i ' ^ i ' 

[" - H ( X £..»)] 

i 

= I c(E”i n)S U’*f n ) -E*i n)l ‘ ( S A«>’*B n ) 

^ i ' i 

ZIc(E i '?' )s aO'>’*b h \ 

^ i ' 

since channel 4)®", along with <bs, is degradable and hence, I c (y S l A ^ n) , > 0. 

Therefore, by using (10.62) 

„rg„, b (<-.">• ,s ® 0 - * {s l“0 ]=fs h ( x< ”’' *®“) ■ 


which is equal to nQi(‘t’s) due to (10.56). Substituting this into (10.61) we ob¬ 
tain (10.63). □ 
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10.4.2 Proof of the Private Capacity Theorem 

We first prove inequality < i.e. the Weak Converse. Let R be an achievable rate. In 
this case, by (5.33) and the information bound (5.16), 




> 1 


({l/A},^}) 


1 


= 1 - 


nR nR 

X ({1 /^V}•> — X ({1/-W}> {^( n) }) 


nR 

1 + /({W.{5^ (b) }) 


nR 


(10.64) 


According to (10.59), 

X ({1/IV}, {S^}) < v E (E<">) [n log d E - log v E (SW)] , 

where d E = dim X E . Therefore, the last term in the right hand side of (10.64) tends 
to zero as n —»• oo, along with the left hand side. From this follows 0 > 1 — C P /R, 
where C p is the right hand side of (10.61), and hence C p ($>be) < C p . 

To prove the direct statement it is sufficient to show that for an arbitrary ensem¬ 
ble {n x , S^} and S > 0 the rate x ({jr*}, {S^}) — x ({tt*}, {5^}) — 8 is achievable. 
Achievability of the rate C p — S then follows by additional blocking. Consider the c-q 
channel x —> S p and the random codebook of size N = 2" , 

with independent words having the probability distribution 

P{w = (*i,..., *„)} = n Xl ■ ■ ■ n Xn . (10.65) 

The constant c in the definition of N will be fixed later. To simplify the notation, 
we will denote all constants that appear below by the same letter c. In fact, we can 
always take the maximum of them. By the remark that follows the proof of the Direct 
Quantum Coding Theorem at the end of Section 5.6, for sufficiently large n, there 
exist decision rules for the system B such that 

EP e {W (n) , M {n) ) < 2~ nfi , P>0. 


Now, consider the new random code with independent words having the uniform dis- 

A n 

tribution on the set T n, ° of ^'-strongly typical words, 


P (w) 


i 


J'n,8 


1°. 


w e i n ’ s ; 
w £ f n ’ & . 
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By Exercise 9.10, \ f n,& \ > 2 n ^ H ^ 4 "(^, where lim^olim^-^oo A„(<5) = 0. 
Since P (w) > 2~ n ^ H ^ +c ^ for w e T n ’ S , where c is a constant (depending on 
we have P ( w ) < 2"(5)]p ^ w ) f or a q an( j h ence 

E P e (W^ n \ < 4 "[c5+An(5)] 2 — < s 


for n large enough, because the estimate (5.52) for P e (W^ n K M (n ^) involves at most 
two independent words. Hence, by the Markov inequality, 

P {P e (W {n) , M w ) > Ve) < Ve (10.66) 

So far, this was just a slight modification of the random code for the classical infor¬ 
mation transmission from A to B. To make it secure against eavesdropping, A has to 
sacrifice n [/ ({xx}, }) + c8/2 ] bits of information, by additional randomization 

of the input. Assuming x ({n x }, {S£}) - X (i^x}, {S%}) > 0, set 

jy E —2 "[z({^},{5 , £.})-(-c5/2]^ jy B _ 2«[z({^},{5 , g})-^({3T A: },{5'|.})-3c5/2]^ 

so that Ne Nb = N, and arrange the codebook W (n ^ in a rectangular array with Nb 
rows and Ne columns. In this case, 

W (n) = = 1 = 1,...,A^}, 

= j M mj \m = = 1,...,A £ }. 

(For brevity of notation we omit the zero component of M (n ^ which corresponds to 
“no decision”.) 

For each m, A will take the value j at random, with equal probabilities, which 
results in the input states 


m 


om _ 


N e ^ A(n) ’ 
j =i 


with the corresponding outputs for B and S Ein) for E. (Let us recall here, that 
S™ in) = S^' <g> • • • (g) S% n for a word w = (x\,... ,x„).) Such a randomization will 
make the transmitted information almost secret to E, because from the viewpoint of 
the eavesdropper, for every m the codebook 

{w mJ ;j = 1 ,...,A £ ) 

with a high probability is a “good” codebook, that carries almost the maximum possi¬ 
ble information from A to E, provided an optimal strategy is applied by E. Therefore, 
the mutual information between the codebooks with different m must be close to zero. 
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Hence, randomizing inside each of these codebooks discards almost all information 
carried from A to E. 

The crucial step in the rigorous justification of this scenario is the application of 
the following estimate, based on a quantum version of the Bernstein inequality to be 
established in the next section. 

Proposition 10.32. Let \m = 1,..., Ng ; j = 1,..., Ne be the random i.i.d. 
density operators, with Ng , Ne chosen as above. There exists a ft i > 0 such that, for 
n large enough, 

Ne 

i4<2- 2 "'\ (10.67) 

E J = 1 1 

for each m = 1,..., Ng, where 6 is a nonrandom operator (in fact, one can take 
6 = E Sg™, but we do not need this). 

This estimate implies that the probability 

p{|| 5 £<«>-0|Ii <£ ; m = \,...,Ng}> \-N B 2~ 2nl>l 

- ! _ 2 "U({^}d^})-z({^},{^})-3c5/2]-2^i , 

can be made arbitrarily close to 1 for n large enough. Together with (10.66) this 
implies that there exists a realization of the codebook fp(") = |»; m f f or which 

P e (W M ,M {n) ) < ve; 
and 

| 5 "<» ) - s 'i<n>|| 1 <2e; m,l = \,...,Ng. (10.68) 

Defining = {S% n f m = 1 = {M m ; m = 0,1,..., Ng), 

where M m = M mj , we have P e (Y. {n \ M {n) ) < P e (W {n) , M™) < Ve. An 

argument similar to Lemma 4.11 shows that we can choose a subcode for which the 
maximal error P e (W^ n \ M^) < 2^/s while vg (£^) < s', where s' ->• 0 if e -»• 0, 
which proves the theorem. □ 

Proof of Proposition 10.32. As in Section 5.6, we introduce the density operator 

Sn = 

X 
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and we write for brevity, 

H(S X ) = H(E), J>*ff(Sf) = tf(E|4), 

with X ({jt x }, {S£}) = H(E) - H(E\A). 

Denote by Ay the eigenvalues of SA-. Let us fix a small positive 8, and let P = 
P n,s be the strongly typical projector of the density operator S® n , corresponding 
to the eigenvalues A j = A j 1 ■■■ Ay n for which the sequence J = (j\j n ) is 
strongly typical, i.e. the numbers N(J | J) of times the symbol j appears in J satisfy 
the condition 


N(j\J) 

n 


< 8 , 


j = 1 ,...,d E , 


and N(j \J) = 0 if Ay =0 (see Definition 9.9). Denote by Ay (x) the eigenvalues 
of the density operator Sf . For a codeword w = (x \, x n ) of length n , let S w = 
S (gi • • • <g> Sg n and denote by P w the typical projector of S w , corresponding to the 
eigenvalues A j(w) = Ay, (a - ] )'' ’ ^ jn (.x n ), for which 


N(j,x\J, w) 
n 


N(x\w) 

n 


A j(x) 


< 8 , 


j — 1 ,..., ds , 


and N(j,x\J, w) = 0 if Ay (x) = 0. We shall need the following properties, which 
are just a reformulation of the corresponding properties for strongly typical sequences 
in the classical information theory, see [44]: 

n 

i. For w e T n ’°, the eigenvalues of the operator P W S W P W lie in the interval 
^2-n[H{E\A)+cS] 2- n [H{E\A)-c&]^ where c is another constant, depending on 
{Ay (a;)} (to simplify the notation we use the same c as before. In fact, we 

can choose the maximum of the two constants). In particular, P w S w Pw — 
r\—n[H{E\A)—c&] p 
z r w- 


ii. For si > 0 and sufficiently large n, Tr P w S w > 1 — a. (This is a corollary of the 
Law of Large Numbers for the probability distribution given by the eigenvalues 
of S w ). 


The following property is less straightforward. 

iii. For given s\,8 > 0, there is a Si >0 such that for w e T n,n ' and sufficiently 
large n, Tr PS W > 1 — e\, where P = P n,s is the strongly typical projector of 

3 71 ■ 
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Proof. Denote by p^ k Hj) = {ej\S^\ej), where \ej) are the eigenvectors of S„, and 
notice that p^ k \j) = 0 if Xj = 0, because supp S^ k C supp S„. In this case, 

Tr P n ’ S S w = £ P W {h)---p {n \jn) 


JeB n - 


NU\J) 


x, 


<8, if Xj > 0> , 


(10.69) 


where P is the probability distribution, ascribing the probability p^(Ji) • ■ • p^ n> (j n ) 
to the sequence J = (y’i,..., j n ), and B n,s is the set of strongly typical sequences. 
The fact that w € T n,Sl is expressed by similar inequalities 


7V(jc |«;) 


■ Jr, 


< #i, if 7t x > 0, 
and A(;c|i<;) = 0 if n x = 0, which implies for Xj = £ ELt 


\Xj Xj — 


( e j I 

E 


1 " 

-y> 

. n ti 

A(.x|i<;) 




\ej) 


- n x 


{ej\Sg | ej) 


< SiK, 


where K is the number of symbols x. Therefore, by choosing 5i = 5/2 A - , we have 


- Xj 


< 5; for all j : X, > 0 


> P 




-X, 


< 5/2; for all j : Xj > 0 


>1- £ P 

y • Ay >0 




-x, 


> 8/2 


But under the probability distribution P, the random variable N(j\J) has mean value 
EL i P (k Hj) = nXj and variance ELi P (k) (j)( l ~ P (k) U)) < «/ 4 - Applying 
the Chebyshev inequality and taking into account (10.69), we obtain iii. □ 

We proceed with the proof of Proposition 10.32, for which we also need the fol¬ 
lowing lemma. 


Lemma 10.33. For a positive operator X and a projector P 
\\X - PXPWi < 3-y/Tr A'Tr(/ - P)X. 
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Proof. By the triangle inequality and the properties of the trace norm 

ll*-^lli < || (I-P)X(I-P )||i + 2\\PX(I-P)\\ 1 <3||*(/- J P)|| 1 . 

There is a unitary U such that 11^(7 — 7 > )||, = r Fr X(I — P)U. By the Cauchy- 
Schwarz inequality for the trace (take S = I/d in (2.24)) 

pIY- P)f/] 2 < TrX TrU*(I - P)X(I - P)U = TrA'Tr^ - P)X, 

Hence, the result follows. □ 

To simplify notation in (10.67), we shall denote S™™ = S w j, where the code¬ 
words w-i;j = 1 ,...,Ne, are i.i.d. uniformly on the set T n,s We shall ob¬ 
tain (10.67) by applying the operator Bernstein inequality (10.76), to be established 
in the next section, to the random operators 

Xj = 2 n[H ^ A) - S] TlPP w j S wi P wJ P n, 

where Id is the projector onto the eigenspace of the operator 

O' - E PP w j S w j P w j P 

corresponding to eigenvalues > 2 - "[^ £ )+ 3ci5 / 2 ]. By construction, 0 < Xj < /, 

M = EXj = 2 n[HmA) ~ cS] E{TlPP w j S wJ P w j PH) = 2 n ^ E ^~ c ^ Tie’ll 

and \1 — A)-H(E)-5c8/2] _ 2-'»[x(0 r *}>{ s l})+ 5c5 / 2 ], so that N E /r = 2~ 2ncS 

and (10.76) imply 

> S 2 2 ?mE\A)-c 8 ] j < 2d n exp 2 -2 «c5 £ 2 /4 ^ 

< 2~ 2 ~" e ' (10.70) 




1 


Ne 


— 'y Xj(a))-M 

NE h J 


for some /3i > 0 and n large enough. Here, we used the fact that TrM< 2 n ^ H( ' E<,A ^~ c ^. 
It remains for us to derive (10.67) from (10.70). 

Denoting 6 = 110'II, we have 


, Ne i Ne 

TZ X! S ™ J - A7T X! II ~ PS wJ P II l 


N e t 
E 7=1 


li Ne U 


+ 


, N e , N e 

— X ps’ ,p -Xn/>s' ,-PU 

N e w N e ^ w 


7 = i 


1 


N e 


+ - TIPS’ iPYl-d 

N e ^ w 
E 7 = 1 


(10.71) 

(10.72) 

(10.73) 


l 
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The estimate (10.70) implies 
Ne 


i 

— V yips' iPn-e 

Ne a. 


< 




> 1 - 2 ' 


- 2~"^1 


The inequality 


“£n'X J ' , n-e 


< £2 


(10.74) 


gives the estimate for the term (10.73). 

For the term (10.71), we use the triangle inequality 


II s w j - PS> wJ P II J < IIS*, - P W J s' wJ P W J II J + II s' wJ - ps’ wJ p II x . 

The first term on the right can be made small for sufficiently large n by property ii. 
and Lemma 10.33. Further, notice that for in 7 e T n,Sl 


TV PS' wJ = TV S' wJ - TV (/ - P)S' wJ > TV P w j S w j - TV (/ - P)S w j > 1 - 2e x 

(10.75) 

by ii. and iii. Applying the lemma, we can also make the second term on the right 
small for sufficiently large n. 

The term (10.72) has the form ||S — n.S’ n || x . Therefore, to show that it is close to 
zero, by Lemma 10.33 it is sufficient to show that 

, Ne 

tv ns = tv — V ,-p n 
N e 

is close to 1. But, according to (10.74), this is not less than Tr0 — e 2 = TV n 9' — e 2 - 
Now, TV T10' = TV0' - Tr(7 - n)6>' > TV6>' - 2~ ncS/2 , because Tr(7 - n)6>' is 
the sum of eigenvalues of 9' that are less than 2~ n ^ H ^ +3cS ^ 2 \ while the total num¬ 
ber of positive eigenvalues is less than or equal to dim supp O' < dim supp P = 
2 n[H(E)+c&] j t rema j ns f or us to prove that Tr 9' is close to 1. Flowever, O' = 
EPS' w jP and TV 9' = ETr PS' wJ > 1 — 2ei by (10.75). This implies that, given 
e, we can choose £i, £2 such that (10.67) holds for n large enough. □ 


10.4.3 Large deviations for random operators 


Theorem 10.34 (Ahlswede and Winter [4]). Let X\ (co),..., Xn (co) be i.i.d. 
operator-valued random variables in M, dim M = d, such that 0 < X, (co) < I, 
M = EZ, > pi. Then, for 0 < £ < 1, 


1 0 : 


L^XjW-m 


> fiTr M 


< 2d exp 



P 


(10.76) 
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Proof. The proof consists of several steps. First, we prove the operator Markov in¬ 
equality. Let Ibea Hermitean operator-valued random variable, t > 0. In this case, 

P {at : X (co) f 0} < Tr E exp tX, 


where {co : X (at) 0} = {co : X (at) 5 0}. Indeed, 

P {at : X (at) ^ 0} = P {at : expfX (at) I} 

< El{ex P t^^/}Tr exp tX 

< Tr E exp tX, 

where the first inequality follows from the fact that 0 < T ^ / implies 1 < Tr Y. 

Second, by letting X (co) = %i (<»), where Xi are i.i.d. Hermitean operator¬ 

valued random variables, we obtain 

P j" : Xj Xi ^ °| - Tr E ex P X] Xi j 

< ETr exp 1 1 ^ Xi I exp tX^ 

= Tr E exp 1 1 Xi I E exp tX^ , 

where the Golden-Thompson inequality ([27], Theorem IX.3.7) was used: 

Tr exp (A + B) < Tr exp A exp B. 

By inequality (1.18) this is less than or equal to 


Tr Eexp ^ ||EexpfXtv II • 


Applying the estimate iteratively, we get 


P jot : ^2 Xi ^ o| < d ||EexpfX, 


i N 


(10.77) 


Third, let X\ (at). Xn (at) be i.i.d. operator-valued random variables, such that 

0 < Xi (at) < /, EX, = \jl !, and 0 < /r < a < 1 , then 

P {at : — Y Xi (at) f a 1 1 < d exp (— Nh(a ; /i)), 

\ I 


(10.78) 
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where h(a ; fx) = a In ^ + (1 — a) In Indeed, by using (10.77) for Xi (at) — a/, 


we obtain 


jot : — ^ Xi (at) ^ a/ 1 < d ||E exp tXi exp (— atN ). 

But expfx < 1 + x (expt — 1) for x € [0,1]; hence 

exp tXi < I + Xi (exp t — 1), 
and the right hand side is less than or equal to 

d [1 + \x (exp t — 1)]^ exp (—atN). 


Taking expf — - 1 provides the right hand side of (10.78). Similarly, for 

0 £ a < /a < 1 , 


P jew : — ^ Xi (co) 2 ; a/| < d exp(—Nh(a; fx )). 


(10.79) 


Note that /r(/x; fx) = 0, -^h(a\fx)\ li ,= a = 0, ^ h(a\fx) = a(] ] _ a) - It follows that 


a 2 


/r 




^'‘("<i ±s); '‘) = (TT S M 1 -M 1 ± e)) - 2 ' 


if ix < |e| < 1. Hence, 


h(fx( 1 ±e)\fx) 


fXS 


(10.80) 


Finally, let Xi (co) satisfy the conditions of the Theorem 10.34. Consider the oper¬ 
ator random variables Yi ( co ) = /xM~ l ^ 2 Xi (co) M~ 1//2 , such that 0 < Y, (co) < I 
and EY, = fxl. We have 


i 1 N 

{ i = l 


< eTr M 


2 |at:(l-e)M<^X;7 i (at)<(l + e )Mj 
= jot : (1 - e)ix < ^ Yi (at) < (1 + e)/r| 

= jot : (1 - s)/x < ^ Yt (at) j 

n j" : Xj Yi - (! + j • 
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By taking the complements and applying the estimates (10.78) and (10.79), together 
with (10.80), we get (10.76). □ 

10.4.4 The Direct Coding Theorem for the quantum capacity 

The proof of inequality <2(d>) > Q (<t>) will be based on a “coherent” version of the 
proof for the private classical capacity, i.e. a version that uses the same random coding, 
but where the mixtures of states are replaced with superpositions. It is sufficient to 
prove that for a given input state S the quantity R = I C (S, <F) S is an achievable 
rate. Achievability of Q (<J>) — <5 is then shown by applying the argument to <J>®” and 
taking the limit n —> oo. In this section, we will show how to construct an encoding 
and decoding such that 


F e (S (n) , 5) (n) o o gWj -► 1, (10.81) 

where is the chaotic state in M^ n \d„ = dim X (n) — 2 nR . By the argument 
following Definition 10.19 this means that the rate R is achievable for asymptotically 
perfect transmission of quantum information. 

For channel <J>, we consider representation (6.7) (Theorem 6.9), namely 

4>[ S] = Tr E VSV*, 

where V is an isometric map from the input space Ma to the tensor product M B ® Me 
of the output and the environment spaces. 

Consider a spectral decomposition 5 = n x S%, where = \<Px)a(<Px\, and 
{\<Px)a I form an orthonormal basis. Denote by \4>' x )be = V\4> x )a the joint output 
states of BE. In this case, the states of B and E, correspondingly, will be Sg = 
Tr e \4>x)be (4>x I an( l Sg = Fx b\<P'x)BE{ 4>'x \-Let us recall that, according to (10.58), 

Ic (S, 4>) = X {{n x }, {S%}) - X {SID • ( 10 - 82 ) 

Our aim is to construct a sequence of subspaces M ^ such that dim M^ = 2 nR , 
R = I c (S, <I>) — S, and corresponding encodings and decodings for which (10.81) 
holds. 

Now, consider the block channel and the codebook 

W w = \w mj -,m = 1 = 1,...,A £ J, 

used in the construction of the privacy code, with the corresponding resolution of the 
identity M (n) = { M m j } in Mf n , such that 

P e (W^ n \ AfW) = max (l - Tr Sg{ n) M mJ ) < e. 


(10.83) 
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where S™ J (n) = S*' 


S B n if w m J = (xi,... ,x n ), 


, N E 

— V s mj 
N E ^ Ei ’ 


( n ) 


< 


Ve; m = 


(10.84) 


Consider the space M^ with the orthonormal basis {| m)\m = 1,..., TV#}, where 
Nb = 2 ’'W”x)AS x B })-x({xx}AS x E }ys] = 2 «[/ c (S',<t»)-5] 


and the encoding 8^ given by the following isometric map from to M? n : 


n e 




where | (j > mj )a = \4> Xi )a ® ® \<t>x„ )a if w mj = (xi,..., x„), and {a mj } = a is a 

collection of phases to be chosen later. Isometry of this map follows from the fact that 
{l0mf)/t} is an orthonormal system, because it is built from the orthonormal vectors 
\4 > x)a- 

This is what was called the coherent version of the private code for the classical 
information. The action of the encoding on the maximally entangled vector (10.38) 
produces the vector 


1 


N b 


X 


N e 


so* 

The encoding followed by the channel gives 
1 






N b 

X i™> 


m =1 


, Ne 


J ■) 

mj / 


BE 


(10.85) 


where the vectors \(f>' m j )be = V\4> m j )a are again orthonormal. Note that S™( n) — 
^ EW mj )BE{<l>' mj \, S™{ n) = Tr B \^ mj ) BE (^' mj \. Where, in order to simplify nota¬ 
tions, we denote by Tr e the partial trace with respect to M® n etc. 

To construct the decoding £>( n \ we first use the observable = {M m j}, which 
provides a measurement of mj with asymptotically vanishing error. By the discussion 
of the measurement process leading to relation (6.37), there are an auxiliary system 
= ■W (n) ® with the basis {| m) <g> |y)i}, a unit vector \ fo) € and a 

unitary operator U in M® n ® suc h that 

M mj = Tro (I B (n) <8> \fo)(fo\) U* (I B(n) <g> \mj) 0 (mj\) U, 
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where we denoted \mj )o = | m) <% |/)i. Defining 

\fmj)BE0 = (/^(n) ® U) ( \(j>' m j)BE ® l3^o>) , 
we have the following state vector of the system M^ <® M® n <g> JC ®" <g> M^ 


|T(a)> - 


1 

7^ 


N b 

\ m > ® 

m=l 


AOs 


1 

-J=yy« 


IVy/)fl£0 


as the result of the action of the encoding, followed by the channel, and further fol¬ 
lowed by the measurement, on the maximally entangled vector. Here, {\fmj) beo} 
is again an orthonormal system. Relation (10.83) means that the “projection” of this 
system onto M^ is close to {\mj)o\. More precisely, 

o{rnj \ (Tr BE\fmj)BEo(fmj\) \mj) 0 > 1 -e for all mj. 

The following lemma implies that there exist unit vectors | Xmj) be e Mf n ®M® n 
such that, for \f m j)BEo = \Xmj)BE ® \mj)o, one has 

\BEQ {^mj |tymj )BE0 | > 1 ~ S. (10.86) 


Lemma 10.35. Let \<p)o e Mo, \\jr) e Mo M be two unit vectors. In this case, 

o{(p\ (Tr M \f){f\) \(p)o = max 1(^1 (|^)o ® |*))| 2 , (10.87) 


where the maximum is taken over all unit vectors |/) e M. 

Proof. By fixing an orthonormal basis {|e 7 )} in M, we have |y) = ]T7 Cj\ej), with 

£/ I cy 1 2 = 1 and 


\{f\(\<P)o ® \X))\ 2 


2 


J2 c j{^\(\(P)o ® \ej)) 
j 


Maximizing with respect to Cj we obtain (10.87). □ 

Notice that {\f m j)BEo} is again an orthonormal system, approximating 
{Wmj) beo} in the sense (10.86). Defining 


1 


N b 

m= 1 


N e 


* Ne 

==J2 eiamJ \MBE0 


|F(ce)> = 
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we have 


j J {T(a)\T(a + a'))d/x(a) = 


1 

NB N E 


Nb Ne ' 

T, ye lOtmj BE0(fmj\fmj)BE0, 

m=\j = l 


where d/i(a) is the uniform distribution. Here, the integration eliminates the inner 
products with j V j'. One can always pick a' = {a' m j } such that the right hand side 
will be 


. Nb Ne 


BEo{tmj\tmj) B EO 


> 



by (10.86). Hence, there is an a such that i)i(Y(a , )|r(a' + a')) > VI — e. By intro¬ 
ducing an additional common phase to a' m j we can always make the inner product 
positive. Hence, 

(Y(a)|r(a +u')) > y/l — s. (10.88) 

There is one more transformation to incorporate into the decoding which will 
leave the system E in the state almost independent of m. To achieve this, note that 
combining (10.86) with (10.17) gives 

11 ^mj )bE 0 (i^mj | — \ifrmj )bE 0 (^mj 11| j <2 \fe. (10.89) 

Now note that S™{ n) = Tr Bo\tmj) B Eo{tmj I and denote 

S frin) = Tr Bol^mj ) B E()(^ / mj I = Tr b\X mj) BE {Xmj |- 
Then (10.89) and Exercise 10.25 imply 

<2^, (10.90) 

which, combined with (10.84), implies 
N e 

<0(V i); m = l,...,N B . 

E 7 = 1 l 

Consider the vectors 

1 \ 

Wm) B E\ = _p y e l y* mj +0lmj > \Xmj)BE ® | J > 1, 

^, = 1 


(10.91) 
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which are purifications of ^ Let \<pe) € Mf n 8 .Jf ® n 8 M.\ n ^ be a 

purification of 0. Then (10.91) implies that there are unitary operators W m in Mf n 8 
M such that 


be\( 4>o\ (l x ®n ® Wot) <Pm)BEi >1-0 (Vs) for all m. (10.92) 


This follows from inequality (10.33), which says that if two states are close in trace 
norm, there exist purifications of these states with fidelity close to 1. 

Defining a “controlled unitary” operator IT = Ylm=\e l ^ m W m ^\m)(m\ in J(f n <S> 
JCq ', where fi m are phases to be adjusted later, we have 


(l x ®n 8 IT) (\<p m )BEi <8> | m)) = e l ^ m (l x ®n (8 IT OT ) \<p m )BE\ ® I m). 
Hence, 



) & I np®n & 

Jt E 


^)|r(a + a')) 


= ~y= ^ | m) (8 | m) (8 e l ^ m (l x ®n (8 IT OT ) \(p m ) B Ei- 

" m= 1 

Denoting |£2 (n) ) = Em=i I m) <8 \m) the maximally entangled vector in JC 8 

Jffo), we have 

({f2 (n) | 8 (0<?|) (/*(„> 8 /*»,. 8> IT) |T(a + <*')> 


1 

Nb 


e Wm {4>e\(l x ®n 8 Hot) |^ot)s£1- 

OT =1 


This can be made equal to 


1 

Nb 


Nb 

(0<? i o 


OT= 1 




< 8 >h 


8 W m 



BE 1 


by an appropriate choice of phases fi m , which is greater than 1 — 0 (a/®) by the 
estimate (10.92). Comparing this with (10.88), we have 

(<n (n) |8<0*|) (/* («) & I M ®n 8 IT) |T(a)> > 1 - 0(y/s), 

as follows from the triangle inequality (10.31). 
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Introducing the density operator 

S' = Tr be\ (ixw <8 I <8 |T(a))(T(a)| (i 3£(n) & 1 ^ W ^ 

in #<"> (8) M 'we have 

f ^|S 2 (n) )(S 2 ( " ) |, s'^ = (n^is'in^) ( 10 . 93 ) 

> |(<£2 (,l) | ® <^|) (l xw (8 I^n (8 w) |T(a ))| 2 

> 1 — 0(^/s). 

On the other hand, 

S' = (ld^,„, (8 £> (n) o <D 0n o gWj [|S2 (n) )(S2 (n) |] , 

where the decoding 5) is defined as 

® (B) [%«>] = Trst^C/ (S bM (8 IV'oXV'oI) C/*^* 

(remember that \ijro) € (8 Then (10.93) is the same as 

F e (S (n) , £> (n) o o gWj > 1 - 0 (Ve), 
and (10.81) is established. □ 

10.5 Notes and references 

1. The first examples of quantum error-correcting codes were constructed by Shor [35] 
and by Steane [201]. Many authors contributed to the subsequent development of 
the subject, sketched in this chapter, see a survey in Nielsen and Chuang [158], and 
Gottesman [72], Necessary and sufficient conditions for error correction of Theo¬ 
rem 10.3 were proposed by Knill and Laflamme [133], For the solution of Exer¬ 
cise 10.4, see [158], n. 10.2. The possibility of error correction is of basic impor¬ 
tance to the problem of the realization of a quantum computer, see, e.g. Nielsen and 
Chuang [158], and Valiev [210], An architecture of a fault-tolerant quantum com¬ 
puter was proposed, correcting errors not only in the quantum register, but also in the 
error-correcting modules, see Kitaev [135] and Steane [201], 

The importance of coherent information to perfect error correction was pointed out by 
Bamum, Nielsen, and Schumacher [16]. 

2. The relations between the fidelity measures were studied by Fuchs [62], Bamum, 
Nielsen, and Schumacher [16] and Wemer and Kretschmann [140], Fidelity for mixed 
states was investigated by Uhlmann [208], Lemma 10.17 goes back to Powers and 
Stormer [169], see also [93], 
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3. The Coding Theorem for the quantum capacity was first conjectured by Lloyd [150] 
as <2(d>) = maxp I c (p, $)> but it was soon recognized by Di Vincenzo, Shor, and 
Smolin [54] that Q„ is strictly superadditive, see also Smith and Yard [200], Hence, 
taking the limit in (10.43) is indeed required. Additional evidence in favor of a heuris¬ 
tic random subspace coding argument was given by Shor [191]. See also the discus¬ 
sion in Horodeckis’ paper [117]. The proof based on this approach was given later by 
Hayden [81]. 

The inequality Q(< f>) < <2(<F) (the Converse Coding Theorem), as well as lemmas 
in Section 10.3.1 were established by Barnum, Nielsen, and Schumacher [16] and 
Bamum, Knill, and Nielsen [15], Corollary 10.22 is inspired by an observation of 
Smith and Smolin [198], 

Relation (10.58) was obtained by Schumacher and Westmoreland [178], who used 
the ideas of classical information theory [3] to relate the “privacy” of information 
transmission via quantum channel <f> to the quantity 

X {{Xx}, {$b}) ~ X ({ n x }. {S^}) ■ 

The notion of a degradable channel was introduced by Devetak and Shor [52], who 
showed that the quantum capacity of a degradable channel is given by the one-letter 
expression (10.53). This resembles the notion of a “stochastically degraded” chan¬ 
nel [42] for classical broadcast channels, with the role of second receiver played 
by the environment. For such channels, there is a one-letter expression for the ca¬ 
pacity region. Anti-degradable channels were considered by Caruso, Giovannetti, 
and Holevo [36], The corollary 10.28 is obtained in the work of Cubitt, Ruskai and 
Smith [43], 

Smith and Yard [200] provided an explicit example of a remarkable phenomenon 
named superactivation. There exist cases in which, given two quantum channels <f>i, 
<f >2 with zero quantum capacity, it is possible to have ® <$ 2 ) > 0. The exam¬ 

ple in [200] is built by joining an anti-degradable channel <f>i with an entanglement¬ 
binding channel <f> 2 - 

4. A complete proof of the Direct Coding Theorem > <2(4>)) was given by 

Devetak [51]. It is based on an idea that reveals a profound relation between the 
quantum and the private classical capacities, see (10.58). It is this proof (as well as a 
simpler proof of the converse) which we follow here. The large deviation estimates 
for random operators of the Bernstein type using the operator version (10.78) of Ho- 
effding’s inequality ([89], Theorem 1) are due to Ahlswede and Winter [4], Another 
proof, based on random subspaces, which is closer to initial argument of Lloyd, was 
given by Hayden, Shor, and Winter [84]. Proposition 10.31 is due to Smith [196], For 
more recent results concerning tail bounds for sums of i.i.d. Hermitian matrices, in¬ 
cluding an improvement of the Ahlswede-Winter inequality using Lieb’s theorem on 
convex trace functions in place of the Golden-Thompson inequality, see Tropp [207]. 
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It was already stressed that a quantum channel is characterized by a whole spectrum of 
capacities. Apart from the capacities C, C x , C ea , C p , Q, treated in detail in this book, 
there are classical and quantum feedback capacities (denoted by C b , Q b correspond¬ 
ingly), and the classical and quantum capacities with a classical two-side communi¬ 
cation (correspondingly C 2 , Qi)- In classical information theory, it is well known 
that feedback does not increase the Shannon capacity of a memoryless channel. In 
the quantum case, a similar property is established [29] for the entanglement-assisted 
capacity C ea (4>). Regarding the quantum capacity 2(<F), it is known that it can not 
be increased by an additional unlimited forward classical communication [24, 16]. 
However, Q(4 ) ) can be increased if there is a possibility of transmitting the classical 
information in the backward direction. Such a protocol would allow one to create the 
maximum entanglement between the input and output, which can be used for quantum 
state teleportation. By this trick, even channels with zero quantum capacity supple¬ 
mented with classical feedback can be used for the reliable transmission of quantum 
information [158, 29], For a quantum channel, there is a hierarchy 

c x <c b <c 2 < c ea 

VI VI VI VI , 

Q <Qb<Ql< Qea 

where < should be understood as “less than or equal to for all channels and strictly less 
for some channels”, see Bennett, Devetak, Shor, and Smolin [20], Also, C ea = 2Q ea 
and for some other pairs of capacities both inequalities are possible. Furthermore, 
there exists a so called “mother” protocol for information transmission that realizes 
all the protocols, by using additional resources (such as e.g. feedback or entangle¬ 
ment) [1], For a very detailed and thoroughly systematized survey of the modem state 
of the art in the quantum Shannon theory, see the treatise of Wilde [221], 
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Chapter 11 

Channels with constrained inputs 


The importance of quantum channels with constrained inputs was clear from the be¬ 
ginning of quantum communication. A main question motivating the emergence of 
quantum information theory was the capacity of an optical channel with constrained 
energy of the input signal. From a mathematical point of view, the necessity for con¬ 
straints appears when the system describing the information carrier is infinite, which 
in the quantum case means using infinite-dimensional Hilbert space. One such class 
of systems, called “continuous variable” quantum systems, whose states are in a nat¬ 
ural sense “Gaussian”, will be considered in the next chapter. One of the main goals 
of the present chapter is to obtain general expressions for the capacities of infinite¬ 
dimensional channels with constrained inputs, suitable for applications to quantum 
Gaussian channels with the energy constraint. 

A new feature of the infinite-dimensional case is the discontinuity and unbound¬ 
edness of the entropy of quantum states, which requires us to pay serious attention 
to the continuity of the entropic quantities. Another important feature of channels 
in infinite dimensions is the natural emergence of generalized “continuous” ensem¬ 
bles, understood as probability measures on the set of all quantum states. Various 
capacity-like quantities for a quantum channel involve optimization with respect to 
state ensembles that satisfy the appropriate constraint. The supremums in question 
turn out to be achievable under certain conditions, namely, continuity of the entropic 
quantity and the compactness of the constrained set. These two mathematical prob¬ 
lems will be studied in present chapter. Another subject of this chapter is the structure 
of entanglement-breaking channels in infinite dimensions. 

11.1 Convergence of density operators 

In what follows, 31 is a separable Hilbert space. A linear operator A, defined on 3t, is 
called bounded if it maps the unit ball into a norm-bounded subset of 3t . Most of the 
definitions in Chapter 2 carry over to bounded operators, and we shall explicitly men¬ 
tion when a modification should be made (see also [171], Ch. VI). In particular, the 
operator norm ||A|| is defined as in (1.16) with the maximum replaced by the supre- 
mum. The operator A is bounded if and only if ||A|| < oo. All bounded operators 
form the Banach algebra 93(Jf). The trace norm || • || i is defined as in (1.14), and the 
collection of all bounded operators with || 7” || i < oo forms the Banach space TfJf) 
of trace-class operators. The inequality 

|Tr7M| < linixMH 


( 11 . 1 ) 
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holds for all T € ^(X), A e f8(X). The dual space of T(X) is T(X)* = f8(X), 
which means that every continuous linear functional on T( Jf) has the form T —>■ 
Tr TA for some A e 93 (X), with the norm equal to ||T||. Notice also that 


< 


( 11 . 2 ) 


and ^(X) is a proper subspace of 93 (X) in the infinite-dimensional case. 

A density operator (we also use the term state) is a positive trace-class operator 
with unit trace (cf. Definition 2.2). The state space <B(X) is the closed convex subset 
of ^(Jf) consisting of all density operators in X. It is complete separable metric 
space with the metric defined by the trace norm: p(S \, S 2 ) = || S\ — S 2 II 1 • 

A sequence of bounded operators {A n } in X weakly converges to an operator A 
if lim„_». 00 (i/f | A n \<p) -* (i/f | A\<p) for all 0, e X. It turns out that the weak 
convergence, which is in general weaker than the trace norm convergence, coincides 
with it on <B(X ): 

Lemma 11.1. Let {S n } be a sequence of density operators in X weakly converging 
to a density operator S. In this case, it converges in the trace norm. 

Proof. For any finite-dimensional projector P, 

IlSn-Sllt < \\P(S„-S)P\\ 1 +2||PS fI (/-P)|| 1 +2||PS(/-P)|| 1 
+ ||(7 - P)S n (I - P)|| x + ||(7 - P)S(I - P)|| x . 

The first term on the right tends to zero for any choice of P, since PS n P -* PSP 
due to the weak convergence, and due to the equivalence of all kinds of convergence 
in the finite-dimensional case. For the last two terms we have 

||(7 - P)S(I - P)|| x = Tr(7 - P)S(I — P) = Tr (7 — P)S = 1 — Tr PS, 

which can be made arbitrarily small by the choice of P , and 

||(7-F)5„(7-F)|| 1 = l-TrPS n -► 1-Tr PS 

by the weak convergence. Finally, the intermediate terms can be evaluated as follows: 

||FS„(7 - F)|| j = Tr U*PS„(I - F) = P js~ n js~ n {l - P), 

where U is the unitary operator from the polar decomposition of PS„ (7 — F). Now, 
by the operator Cauchy-Schwarz inequality for the trace 

Tr U*P^/S^(I - F) < /iYFWl - Tr FS„ -> VTrFSVl — Tr FS 

which can again be made small by the choice of F. □ 
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In what follows, when speaking of convergence of density operators or states, we 
will have in mind the convergence in the sense of this lemma. A subset K c <B(X) 
is compact if any sequence {S„} C &(X) contains a subsequence that converges to a 
state S e K. 

Theorem 11.2. A trace norm closed subset K of(B(X) is compact if and only if for 
arbitrary e > 0 there exists a finite rank projector P E such that Tr P E S > 1 — efor 
all S e K. 

Proof Let K be a trace norm compact subset of <B(X). Suppose that there exists 
e > 0 such that for an arbitrary finite rank projector P there exists a state S e K 
such that Tr PS <1 — 8. Let P n be a sequence of finite rank projectors in X 
converging monotonously to the identity operator /# in the weak operator topology 
and let S„ be the corresponding sequence of states in K. By the compactness of K, 
there exists a subsequence S„ k that converges to a state So e K. By construction 
TrPni S nk < Tr Pn k S„ k < 1 — e for k > l. Hence, 

TrSo= lim Tr/ > „,So = lim lim Tr P n , S„ k < 1 — e, 

/—v+oo l-++o o Jfc-v+o o 

which contradicts the fact that So € K c <5(X). 

Conversely, let K be a closed subset of &(X) that satisfies the criterion, and let S n 
be an arbitrary sequence in K. Since the unit ball in 93(Jf) is compact in the weak 
operator topology (this follows from the Banach-Alaoglu Theorem, see e.g. [171], 
Theorem VI.21), there exists a subsequence S„ k that converges weakly to a positive 
operator So. We have 

TrSo < lim inf Tr S„ k = 1, 
k->-oo 

Therefore, to prove that So is a state, it is sufficient to show that Tr So > 1. Let £ > 0 
and P E be the corresponding projector. We have 

Tr So > Tr P e Sq = lim Tr P E S„ k > 1 — £, 
k-*oo 

where the equality follows from the fact that P E has finite rank. Thus, So is a state. 
Lemma 11.1 implies that the subsequence S„ k converges to the state So in the trace 
norm. Thus, the set K is compact. □ 

In the sequel we will need the following partial infinite-dimensional analog of the 
spectral decomposition (1.5). Let (|ey)} be an orthonormal basis in X and {fj} a 
sequence of real numbers bounded from below. In this case, the formula 

F\f) = ^fj\ e j)( e j\t) 
j 


(11.3) 
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defines a self-adjoint operator F (see Section 12.1.1) on the dense domain 

£>(F) = ^ \fj\ 2 \( e j\if)\ 2 < °o}, (11.4) 

j 

for which \ej) are the eigenvectors and fj are the corresponding eigenvalues. 

Definition 11.3. An operator defined on the domain (11.4) by the formula (11.3) will 
be called an operator of the type g\ 

In applications, F is the operator of energy (of an oscillator system). In connection 
with this, an important role will be played by the operator exp ( —OF ), 9 > 0, defined 
by the relation 

exp (—OF)\f) = Y^exp(-6fj) \ej)(ej\f), (11.5) 

j 

which is a bounded positive operator of the type 

Let F be an operator of the type For an arbitrary density operator S, we define 
the expectation 

00 

TV SF = fi < e i I 5 1 e J ) < (11 -6) 

7 = 1 

which is correctly defined, with values in (—oo, +oo]. 

Lemma 11.4. The functional S —>• Tr SF is affine lower semicontinuous on the set 
<S(3£). 

Proof Affinity is obvious. Next, we will use the fact that the least upper bound of a 
family of continuous functions is lower semicontinuous. We have 

N 

Tr SF — sup ^ fj (ej\S\ej), 

N 7-1 

where all the finite sums are continuous. Hence, S n -* S implies 

lim inf Tr S n F > Tr SF, 

n->oo 

which means the lower semicontinuity. □ 

Lemma 11.5. Let the spectrum of the operator F consist of the eigenvalues f„ of 
finite multiplicity and lim^^oo /„ = +oo. In this case , the set 

(B e = {S :Tr SF < E} (11.7) 


is compact. 
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Proof. Without loss of generality, we assume that f„ is monotonously nondecreasing, 
and denote by P n the finite dimensional projector onto the eigenspace corresponding 
to the first n eigenvalues, so that P n f I. The set <5>e is closed, due to Lemma 11.4. 
Since f„+\(I - P„) < F, we have Tr S(I - P n ) < Tr SF < E < s for 
large enough n and for all S e ©g. By the criterion of compactness of Theorem 11.2, 
<Be is compact. □ 

In applications this result implies the compactness of the set of quantum states with 
mean energy bounded by a certain constant. 

11.2 Quantum entropy and relative entropy 

In the infinite-dimensional case, the von Neumann entropy H(S ) of a density operator 
S can be defined as in formula (5.7), although now it can take the value +00 if the 
corresponding series diverges. The relative entropy can be correctly defined by using 
formula (7.2). Most of the properties in Chapter 7 can be extended to this case, except 
for the continuity, which is weakened to the lower semicontinuity. 

Theorem 11.6. The quantum entropy and relative entropy are lower semicontinuous 
on Let {£„} (resp. {S^}) he a sequence of density operators in M, converging 

to a density operator S (resp. S'). In this case, 

H(S) < lim inf H(S n ), 

n —voo 

H(S] S') < lim inf H(S n \ S'). 

n—y 00 

Proof. By inequality (11.2), we have ||S„ — S|| 0. The function r](x) = —x log x 

is continuous on the interval [0,1], Hence, || q(S n ) — ^(.S)|| -* 0. Indeed, rj(x) can be 
uniformly approximated by a polynomial on [0,1], and then we can use the following 
estimate. 

I Exercise 11.7. For a polynomial / and arbitrary operators A, B of norm less 
than or equal to one 


\\f(A)-f(B)\\<c f \\A-B\\, 


where the constant depends only on /. Hint: use the decomposition 


k —1 

A k -B k = Yl A k ~ l ~ l (A -B)B l . 
l=o 


Therefore, for any finite-dimensional projector P, due to inequality (11.1) we have 
\TrP(r,(S n )-r,(S))\<TrP\\r,(S„)-r,(S)\\^0. (11.8) 
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On the other hand, by the same inequality, for a Hermitian positive operator A, 

Tr PA < || jP || Tr Ai 

and hence Tr Ai = supp Tr PA, where P runs over all finite-dimensional projectors. 
Thus, 


H(S ) = supTr Prj(S) < liminf supTr Prj(S n ) = liminf H(S n ). 

p n—> oo p a—> oo 

Similarly, by using representation (7.23) for the relative entropy, we get 

H(S; S') = sup } [//(AS + (1 - A)S') - A H(S) - (1 - A)//(S')], 

A>0 A 

hence 

H(S; S') = sup yTrF [ty(AS + (1 - A)S') - A»;(S) - (1 -A)ty(S')] 

P,X>0 A 

< liminf sup \tv P [rj(XS n + (1 - A)S') - Xrj(S n ) - (1 - A)^)] 
n ^°° P,X>0 A 

= liminf H(S n ; S^). 


□ 


Lemma 11.8. Let F be an operator of type satisfying the condition 

Tr exp(— OF) < oo for all 6 > 0, (11-9) 

In this case, the quantum entropy H(S ) is bounded and continuous on the set (Be 
defined in (11.7). 

Notice that relation (11.9) implies that the operator F satisfies the conditions of 
Lemma 11.5, and hence the set <Be is compact. 

Proof By introducing the density operator Sq = exp (—OF — c(6)), where c(0) = 
InTr exp (—OF), we have 

H(S) = -H(S; Sq) + OTr SF + c(0), (11.10) 

where H(S ; Sq) is the relative entropy. Hence, 

H(S) < -H(S; S e ) + OE + c(0), (11.11) 

if S e <Be ■ Therefore, the entropy H(S) is bounded on B>e- Since H(S), by Theo¬ 
rem 11.6, is lower semicontinuous, it is sufficient to show that it is also upper semi- 
continuous, and hence continuous, on the compact set B>e■ Let {S„} C B>e be a 
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sequence of states weakly converging to S. From inequality (11.11) applied to S n , by 
lower semicontinuity of the relative entropy, 

limsup H(S„) < —H(S;Sg) + BE + c{9) 

n-*oo 

= H(S) + 9(E — Tr SF), (11.12) 

where in the last equality we used (11.10). Letting 9 -»■ 0, we obtain the upper 
semicontinuity of H(S). □ 

Notice that when F is the operator of the energy, S$ is the density operator of the 
Gibbs equilibrium state at the inverse temperature 9, and the function — 9~ 1 c(9) is 
the free energy. The relation (11.10) and the non-negativity of the relative entropy 
imply 

H(S) <9TrSF + c(9), (11.13) 

which is equivalent to the Gibbs variational principle. 

11.3 Constrained c-q channel 

Let X = {x} be an infinite alphabet. For every x, let S x be a density operator in 

Jf. with finite von Neumann entropy H(S X ). The map x -»■ S x will be called a c-q 

channel with input alphabet X. Let /(x) be a function defined on X, taking values in 
[0, 4-oo]. We shall consider the class IPe of finitely supported probability distributions 
it = {%) on X satisfying the condition 

J2f(x)* X <E, (11.14) 

X 

where £ is a positive real number. We assume that !Pe is nonempty and impose the 
following condition onto the channel: 

sup FI I y^itxSx J < oo. (11.15) 

xzPe \ X I 


Definition 11.9. We define a code (W,M) of size N and length n as in Definition 5.1, 
with the additional requirement that all codewords w = (xi,... ,x n ) e W satisfy the 
additive constraint 

f( Xl ) + ■■■ + f(x„)<nE, (11.16) 

The average error P e (W, M ) and the minimal error p e (n, N ) of the code are given 
by formulas (5.5) and (5.6), respectively. The constrained classical capacity of chan¬ 
nel x -> S x is defined as in Definition 5.2, i.e. as the supremum of achievable rates R 
such that limn-Kx, p e (n, 2 nR ) = 0. 
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Theorem 11.10. The constrained classical capacity C of a c-q channel x -> S x that 
satisfies condition (11.15), with input constraint (11.16) is equal to the quantity 


Cy — SUD 


H 


E 


TtxSx 


'y ' itxH {Sx) 


(11.17) 


Proof. We denote by IP^ the class of probability distributions on X” satisfying the 
condition 

E [/(*i) + • • • + fi x n)\ttxi,.. .,X„ < nE (11.18) 

x\,...,x„ 

and define the quantity C^ as in (5.30), with the modification that the supremum 
with respect to n is taken over tPj?\ i.e. 

c x n) = SU P x({^io}; {*Sw })- 


Lemma 11.11. The sequence {cjf ^} is additive, i.e. C x n ^ = nC x . 
Proof. By (5.31), we have 

Xn(? 0 < E*^)- 

k —1 


where n ^ is the k -th marginal distribution of n on X. Also, 

n 

x(x (k) ) < nx(x), (11-19) 

k = 1 


where n = 2 J2k=\ it^ k \ since xi 71 ) I s a concave function of it (as follows from 
concavity of the von Neumann entropy). Inequality (11.18) can be rewritten as 


££/(*)*<*>(*)<£. 

k=l 


which implies that it € tPg if it e tP^ ■ Taking the supremum in relation (11.19) 
with respect to it e tPj?\ we obtain C x n ^ < nC x . The converse inequality is obvious. 

□ 


The proof of the Converse Coding Theorem can be based on the following corollary 
of the Fano inequality (cf. (5.33)), 

— sup „,<«> sup M J n (it, M ) j 

P e (W,M)>l -^---(11-20) 

nR nR 
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which is obtained as follows. Let again the words in the code (W, M) be taken with 
the input distribution assigning equal probability l/N to each word. Consider 
inequality (4.37). Since the words in the code satisfy (11.16), we have 7 r^ e 
and hence ( 11 . 20 ) follows. 

Using inequality (11.20) and Theorem 5.9, we obtain 


Pe(n,2 nR ) > 1 — 


1 

nR ’ 


where C x is defined by (11.17), which implies p e (n , 2 nR ) 76 - 0 for R > C x . 

In the classical information theory, the Direct Coding Theorem for channels with 
additive constraints can be proved by using random coding with a probability distri¬ 
bution (5.47) modified by a factor concentrated on words, for which the constraint 
holds close to the equality. The same tool can be applied to a c-q channel. Let 7 r be a 
distribution on X satisfying (11.14), and let P be a distribution on the set of N words, 
under which the words are independent and have the probability distribution (5.47). 
Let v n = P(£ YTk=\ f ( x k) — E) ar, d define the modified distribution P under which 
the words are still independent, but 


P(u> = (xi,...,x„)) = 


v n 1?r *i ' 

0 , 


7t 


X n ’ 


if E/L 1 f( x k) < nE 
otherwise. 


( 11 . 21 ) 


Let us remark that since it e Pe , then E f < E (where E is the expectation corre¬ 
sponding to P) and hence, by the Central Limit Theorem, 


lim v n > 1 / 2 . 

n —>00 


Therefore, E£ < 2 m E^ (where E is the expectation corresponding to P) for any non¬ 
negative random variable £ depending on m different words. 

For the error probability P e (W, M), we have the basic upper bound (5.52). Now, 
take the expectation of this bound with respect to P. Since every term in the right 
hand side of (5.52) depends on no more than two different words, we have 

Einf P e (W, M) < 4Einf P e (W, M), 

M M 

and the expectation with respect to P can be made arbitrarily small for N = 2 nR , n —»■ 
00 , with R < C x — 38. Thus, E P e {W, M) can also be made arbitrarily small under 
the same circumstances. Since the distribution P is concentrated on words that sat¬ 
isfy (11.16), we can choose a code for which P e ( W , M ) can be made arbitrarily small 
for sufficiently large n. □ 
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11.4 Classical-quantum channel with continuous alphabet 

Let the input alphabet X be a complete, separable metric space with the rr-algebra of 
Borel subsets. In applications, X is typically a domain in that is locally compact. 
However, we need the more general case for the treatment of infinite-dimensional 
quantum channels in the next section. We consider the c-q channel given by a con¬ 
tinuous mapping x -»■ S x from the alphabet X to the set of quantum states ) 
(by Lemma 11 . 1 , for continuity it is necessary and sufficient that all matrix elements 
(0| S x |0); 0, <p e Jf. are continuous with respect to the metric in X). We will obtain 
a convenient integral expression for the classical capacity C of the channel x —>• S x 
under the constraint ( 11 . 16 ). 

We assume that / is a nonnegative Borel function on X. Consider the set tPg of 
Borel probability measures n on X satisfying 


/ 

Jx 


f(x)n(dx) < E. 


( 11 . 22 ) 


Recall that a sequence of Borel probability measures {jt n } converges weakly to n if 


/ (p(x)nn (dx) 
JX 


I <p(x)jr(dx) 

Jx 


for arbitrary (p € C(X), the Banach space of bounded continuous functions on X (see 
e.g. Parthasarathy’s book [164]). 

Exercise 11.12. Let / be lower semicontinuous. In this case, the functional n 
Jx f (x)jr(dx) is lower semicontinuous with respect to the weak convergence of 
probability measures. 

Hint: use the fact that / is the least upper bound of the family of all bounded 
continuous functions <p < f. 


We will need the following auxiliary result, the proof of which can be found in [164]: 

Lemma 11.13. Let f be lower semicontinuous. In this case, the set -Pe of all finitely 
supported probability measures that satisfy (11.22) is weakly dense in IPj . 

Let us also give a convenient sufficient condition of weak compactness of the set 

rpB 
J E . 


Lemma 11.14. Assume that f is lower semicontinuous and for any positive k the set 
{x : / (x ) < k} C X is compact. In this case, the subset of probability measures 


= |jt : J f(x)n(dx) < E 


is weakly compact. 
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Proof. The lower semicontinuity of / implies that the functional n f% f ( x)n(dx ) 
is lower semicontinuous with respect to the weak convergence of probability measures 
(Exercise 11.12). Hence, the set Pg is weakly closed. It is known that a weakly 
closed subset P of Borel probability measures on X is weakly compact if and only 
if for any e > 0 there is a compact K C X, such that n(K) < e for all n G P (see 
e.g. [164]). For a given e > 0, consider the compact set K = {x : f(x) < E/s}. 
Then, 

n(K) < ~ f_f(x)ji(dx) < e 
E Jk 

for 7r e Pg, Hence, the set Pg is weakly compact. □ 

For an arbitrary Borel probability measure n on X, we introduce the average state 

Sn = [ S x n(dx), (11.23) 

Jx 

where, by assumption, the function S x is continuous, the integral is well defined, and 
represents a density operator in P . 

Assuming H(S n ) < oo, consider the functional 

X (n) = H(S X ) - f H(S x )jt(dx), (11.24) 

Jx 

where, by Theorem 11.6, the function H(S X ) is nonnegative and lower semicontinu¬ 
ous and hence the integral is well-defined. 

Theorem 11.15. Let there exist a self-adjoint operator F of the type satisfying 
condition (11.9), such that 


f(x)>TrS x F\ xeX. (11.25) 

In this case, condition (11.15) holds and the classical capacity C of the channel x -> 
S x with the constraint (11.16) is finite, and is given by the relation (11.17). 

If, moreover, the function f satisfies the conditions of Lemma 11.14, then 

C = max x( n )- (11.26) 

Proof. By integrating (11.25), we obtain 

TV S n F <E (11.27) 


for 7i G P% ■ Hence, by (11.13), 

H(S„) < 9TrS n F + c(0) < 6E + c(9), 
Therefore, condition (11.15) is fulfilled, C is finite and equal to (11.17). 
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If the function / satisfies the conditions of Lemma 11.14, the set Pj? is weakly 
compact. Let us show that the functional n —»■ xi 71 ) is upper semicontinuous. From 
expression (11.17) and Lemma 11.13 it now follows that 

C = sup /(it) = sup /(it), 

XS&E 7t£3>B 

Moreover, the last supremum is achieved by upper semicontinuity of xi 71 ) an d com¬ 
pactness of P§. 

Consider the first term in expression (11.24) for xi 71 )- Since the matrix elements 
of the family { S x ) depend continuously on x, the mapping n -> S n into the set ©£, 
equipped with the weak operator topology, is continuous with respect to the weak 
convergence of probability measures. By Lemma 11.1, the weak operator topology 
on ©(<?() is equivalent to the trace norm topology. By Lemma 11.8, the entropy 
is continuous on (Bg- Therefore, the functional n -» H(S n ) is continuous on the 
set Pf?. 

The function x —>■ H(S X ) is lower semicontinuous. Hence, by Exercise 11.12, the 
second term in (11.24) is upper semicontinuous in n. Thus, the functional (11.24) is 
also upper semicontinuous. □ 

11.5 Constrained quantum channel 

Let Xa, 3?b be separable Hilbert spaces called the input and output spaces, respec¬ 
tively. 

Definition 11.16. By channel we call a linear, bounded, trace-preserving, completely 
positive map <l> : 1:(Ma) -> ^(JCg). As in the finite dimensional case, complete 
positivity means that, for any n = 1, 2 ,..., the map 4> <8> Id„ is positive. 

Exercise 11.17. Any affine map <t> 5 : (5 (Pa) -> &(Mb) extends uniquely to a 
linear, bounded, positive map <J> : ^(Ma) -»■ ^{Mb)- Hint: to define the linear 
extension, use the construction of Exercise 2.7. To prove boundedness, use the 
argument from the proof of Lemma 10.12 (estimates of the type (10.22), (10.23) 
for mnh). 

Example 11.18. Let x -> S x be a c-q channel with the input alphabet X, which is a 
complete separable metric space. Then it can be extended to a quantum channel in the 
sense of Definition 11.16 as follows. Consider the Hilbert space Ma = L 2 (X,m), 
where m is a n-finite measure on X. Assume that S x is an m-measurable family 
of density operators in Xb- Any operator S e T^(Pa) is given by a kernel with a 
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correctly defined diagonal value (x|Sjx). Then the relation 

<I>[S] = f {x\S\x)S x m(dx ) 

JX 

defines a channel <J>: -» 

In this section, we study the classical capacity of a channel under the additive con¬ 
straint at its input. Let F be an operator of type § in Ma, representing an observable, 
the mean value of which is to be constrained. We assume that the input states S ^ of 
the composite channel are subject to the additive constraint 

T r sWfW<„£, ( 11 . 28 ) 


where 

F (n) = F <g> • ■ ■ <g> / H-h / <8> • ■ ■ <8> F, 

and E is a positive constant. Adapting Definition 8.1 and the proof of Proposition 8.2 
to the case of the input states constrained by (11.28), one can prove 

Proposition 11.19. Let the channel <J> satisfy the condition 

sup H($[S]) < oo. (11.29) 

S:hSF<E 

Then the classical capacity of this channel, under the constraint (11.28), is finite and 
is equal to 

C(4>, F, F) = lim -C r F ( ">, nE), (11.30) 

n-*x> n v ' 

where 


C x {fb,F,E) 


sup 

jr:Tr S n F<E 


i 


Here, S n = JF 71 i $i is the average state of the ensemble it = {7r,-, 5)}. 
The finiteness of the capacity follows from 


(11.31) 


Lemma 11.20. Condition (11.29) for the channel <J> implies a similar condition for 
the channel namely 


sup 




< n sup 
S:hSF<E 


(11.32) 
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Proof. Denoting by Sj the partial state of S (n ^ in the j -th tensor factor of M ®", and 
letting 5 = ~ Ylj= 1 $j > we have 

n 

< £#($[$,■]) < nH(Q[S]), 

7=1 

where in the first inequality we use subadditivity of the quantum entropy, while in 
the second we use its concavity. Moreover, Tr SF = ^Tr sWftiO < E and 
hence (11.32) follows. □ 

If the channel <J> satisfies the additivity property 

C x (<f>® n ,F in \nE) = nC x (<&, F, E), (11.33) 


then 


C(<J>, F, E) = C*(<J>, F, E). 

This is closely related to the property of superadditivity of the convex hull of the 
output entropy (8.36), which implies the additivity of /-capacity under the linear con¬ 
straints (8.39) (see Section 8.3.2). 

Exercise 11.21. Prove the following inequality 

C x (<f>® n ,F (n \nE) > nC x (<S>,F,E). (11.34) 

In any case, the quantity (11.31) provides a lower bound for the classical capacity 
C(<J>, F, E). We shall obtain a more convenient expression for C X (<J>, F, E), by using 
the results of the previous section. Consider a c-q channel with the alphabet X = 
defined by the mapping S —> <t>[S]. The constraint is given by the function 
f(S) — Tr SF, which is affine and lower semicontinuous by Lemma 11.4, whereas 
the condition (11.15) turns into (11.29). 

Definition 11.22. We call generalized ensemble an arbitrary Borel probability mea¬ 
sure it on The average state of the generalized ensemble n is defined as the 

barycenter of the probability measure 

S n = J Sn(dS). 

<S(X A ) 

The conventional ensembles correspond to finitely supported measures. 
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Theorem 11.6 implies, in particular, that the nonnegative function S -» 7/(<I>[S]) 
is measurable. Hence, under condition H(<t>(S n )) < oo, the functional 

/*(*) = H(Q(S X )) - J H(*(S))it(dS) (11.35) 

®Wa) 

is well defined. 

I Exercise 11.23. The quantity x&i 71 ) is a concave functional of it. Hint: The 
second term is affine, the concavity of the first term follows from the concavity 
of the entropy. 

Corollary 11.24. Let there exist a positive operator F' in Mb, such that 

Tr exp(—QF') < +oo, for all 0> 0 (11.36) 

and 

Tr<S>[S]F' <TrSF; Se<B(M). (11.37) 

In this case, condition (11.29) is fulfilled and 


C X (<S>,F,E)= sup *<&(jr). 

ir.TXS„F<E 

Moreover, the output entropy //(<J>[S]) is continuous on the compact set (Be ~ 
{S : Tr SF < E}. 

If, additionally, F satisfies the conditions of Lemma 11.5, then there exists an opti¬ 
mal generalized ensemble, i.e. 

C X (<&,F,E)= max X^i 77 )- (11.38) 

jr:Tr S„F<E 

Proof The statement is obtained by applying Theorem 11.15 to the c-q channel 
S —>• OfS], with the constraint function / (S) = Tr SF, the role of condition (11.25) 
is played by (11.37), and the operator F' plays the role of F. The function /(.S') = 
Tr SF satisfies the conditions of Lemma 11.14, due to Lemmas 11.4, 11.5. 

The continuity of the entropy 7/(<I>[S]) follows from the continuity of the map 
S —y S' == <t>[S] and Lemma 11.8, which guarantees continuity of H(S r ) on the 
compact set {S' : Tr S'F' < E}. □ 

11.6 Entanglement-assisted capacity of constrained 
channels 

Let systems A and B share an entangled (pure) state Sab- We assume that the amount 
of entanglement is unlimited but finite, i.e. H(Sa ) = H(Sb ) < oo. By generalizing 
the result of Section 9, one can prove 
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Proposition 11.25. Let <t> be a channel that satisfies the condition (11.29) with the 
operator F satisfying (11.9). In this case, its entanglement-assisted classical capacity 
under the constraint (11.28) is finite and equals 

C ea {<$>)= sup /($,$), (11.39) 

S:Tr SF<E 

where 

I(S, <J>) = H(S ) + 7/(<D[S]) - H(S; <I>), (11.40) 

and H(S ; <J>) is the entropy exchange. 

Now, we investigate the problem when the supremum in the right hand side of 
(11.39) is achieved. Note that condition (11.9) implies that F satisfies the condition 
of the Lemma 11.5 and, hence, the set <5e = {S : Tr SF < E} is compact. 

Proposition 11.26. Let the constraint operator F satisfy the condition (11.9), and let 
there exist a self-adjoint operator F' of the type satisfying (11.9) and (11.37). In 
this case, 

C ea (< J>) = max / (S, <J>). (11.41) 

S:TrSF<E 

Moreover, if the channel <J> is such that 

sup I (S, <J>) = oo, (11.42) 

S 

the maximum in (11.41) is achieved on a density operator S, that satisfies the con¬ 
straint with the equality Tr SF = E. 

Proof. We will consider each term in formula (11.40) separately. Notice that, by The¬ 
orem 11.6, the quantum entropy is lower semicontinuous. Since the entropy exchange 
can be represented as H(S; <t>) = //(3>[S]), where <t> is the complementary channel 
from the system space Fa to the environment space Fe, it is also lower semicontin¬ 
uous and thus the last term in (11.40) is upper semicontinuous. Concerning the first 
term, by Lemma 11.8, it is continuous on the set <S>e = {S : Tr SF < E} if the con¬ 
straint operator F satisfies (11.9). By using the proof of Corollary 11.24, we can apply 
a similar argument to the second term in (11.40), namely 7/(<I>[S]). Moreover, this 
corollary also implies that condition (11.29) and hence (11.39) hold. As follows from 
the above, the mutual information (11.40) is upper semicontinuous on the compact set 
© E , and hence attains its maximum. 

To prove the second statement, we consider 

fo(E) = max / (5, <I>), 

Tr SF = E 

and assume that there exists a state Si such that 

TrSiT < E, I(Si, <I>) > fo(E). 
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Condition (11.42) implies that there exists a state S 2 such that 
TiS 2 F>E, I(S 2 ,<t>)> fo(E). 


Now, putting 

TrS 2 F-E 
~ TiS 2 F -TrSiF’ 

we have 0 < A < 1 and Tr (ASi + (1 — X)S 2 )F = E, so that 

I(XSi + (1 - k)S 2 , <&) < fo(E) < A/(5i, 0>) + (1 - A)/(5 2 , 0>), 
which contradicts the concavity of I (S, $). □ 

11.7 Entanglement-breaking channels in infinite 
dimensions 

A state S e @(Jfi <g> J£ 2 ) is called separable (unentangled) if it belongs to the convex 
closure (i.e. closure of the convex hull) of the set of all product states Si ® S 2 e 
® J£ 2 ) . 

Proposition 11.27. State S is separable if and only if it admits a representation 

S— I Si(x) ® S 2 (x) n(dx), (11.43) 

Jx 

where n(dx) is the Borelprobability measure and Sj (x), j = 1,2, are Borel 
valued functions on some complete, separable metric space X. 

Proof According to the definition, the state S is separable if and only if S = 
lim„_>oo S„, where 


S n = f f Si <8> S 2 7t„(dS\dS 2 ), (11.44) 

Jew,) J<b(x 2 ) 

and {jt„} is a sequence of Borel probability measures with finite supports. 

Let S be representable in the form (11.43). In this case, by making a change of 
variable x S\ (x) <8> S 2 (x), we can reduce the integral representation (11.43) to 

S=[ f Si®S 2 n(dSidS 2 ), (11.45) 

J<5(M 1 ) 

where we use the same notation n for the image of the initial measure under the 
change of variable. The mapping n -»• S is continuous under weak convergence of 
probability measures. This is clear if the set of density operators is equipped with 
the weak operator topology and, by Lemma 11.1, this topology coincides with the 
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trace norm topology on this set. By Lemma 11.13, there is a sequence {n„} of Borel 
probability measures with finite supports, which converges weakly to n. By using the 
continuity of the mapping n -»• 5, we obtain S = lim„_>.oo S n , where S„ is given by 
relation (11.44). Hence, S is separable. 

Conversely, let 5 be a separable state. In this case, S = lim„_>.oo S n where S„ 
is given by (11.44). If we can prove that the sequence of measures {n„} is weakly 
relatively compact, this will imply representation (11.45), where n is a partial limit 
of this sequence. The converging sequence { S „} is relatively compact. Therefore, 
by Theorem 11.2, for any e > 0 there is a finite-dimensional projection P such that 
Tr S n (I — P) < e for all n. For m = 1,2,... denote by P m the projection such that 
Tr S n (I — P m ) < 4~ m , and hence 

j IV (Si ® S 2 )(/ - Pm)n n ( dS x dS 2 ) < 4~ m . (11.46) 

Introduce the following subsets in the direct product ©(Jf)) x ©(J^): 

K m = {(5i, 5 2 ) : TV (Si ® S 2 )(I - P m ) < r l 2~ m } ; JC S = n m >iK m , 

where 8 > 0. By the same Theorem 11.2, the set JQ is compact by construction. Its 
complement satisfies 

Jtn (x s ) < £>„ (K m ) <8j2 2m f ® 5 2 )(7 - P m )n n ( dS 1 dS 2 ) < 8 

m m J 

according to (11.46). Thus, the sequence of measures {n n } satisfies the criterion of 
weak relative compactness (see [164]), which completes the proof of the theorem. □ 

Definition 11.28. Channel <1> : T:{Xa) -*■ TT(J(b) is called entanglement-breaking 
if for an arbitrary Hilbert space J(r and an arbitrary state S € ^>{Ma <8) Mr), the 
state ($ ® Id/?)[S] e ©(Jfg ® J(r), where Id# is the identity map in ^(Xr), is 
separable. 

To describe the structure of such channels, we will need a generalization of the 
notion of an observable (Definition 2.9) to the case of an arbitrary set of outcomes. 

Definition 11.29. Let X be a measurable space with a <r-algebra of measurable sub¬ 
sets J8. An observable with values in X is a probability operator-valued measure 
(POVM) on X, i.e. a family M = {M(B), B e 33} of Hermitian operators in M 
satisfying the conditions 

i. M(B) >0; fieS 

ii. M(X) = / 
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iii. for any countable decomposition B — UBj, (B; ft Bj — 0,i ^ j) one has 
M(B) = Yhj M(Bj ) in the sense of weak operator convergence 

A probability operator-valued measure E is called orthogonal (or a spectral mea¬ 
sure) if 

E(Bi n B 2 ) = E(Bi)E(B 2 )\ Bi,B 2 € £. 

In this case, all operators E(B) are projectors, and the projectors that correspond to 
disjoint subsets are orthogonal. The corresponding observable is called sharp. 

The probability distribution of the observable M in the state S is given by the 
probability measure 

lif(B) = TrSM(B), B e £. (11.47) 

Notice that the linear extension of the affine map S —/i ^ can be regarded as a 
generalization of the notion of a quantum-classical (q-c) channel (see Section 6.4). 

| Exercise 11.30. Prove that formula (11.47) defines a probability measure on 33. 

Theorem 11.31. The channel d> is entanglement-breaking if and only if there is a 
complete, separable metric space X, a Borel <5(Ms)-valued function S B (x), and an 
observable M in Ma with values in X given by POVM M(dx) such that 

0 >[S] = J S B (x)tif (dx). (11.48) 

X 

Relation (11.48) is a continual version of representation (6.28), and to some extent 
the statement can be regarded as an infinite-dimensional generalization of Proposi¬ 
tion 6.22, saying that entanglement-breaking channel is a concatenation of a q-c chan¬ 
nel (measurement of observable M) and a c-q channel (preparation of states S B (x) 
depending on the measurement outcome). In the infinite-dimensional case, the differ¬ 
ence is that an observable with a non-discrete set of outcomes cannot define a quantum 
channel in the sense of Definition 11.16. Indeed, in the finite-dimensional case, q-c 
channels are characterized by the property that all the output states commute. Let d> 
be an infinite-dimensional channel with this property. This can then always be repre¬ 
sented in the form (11.48), with a discrete set X. There exists a basis {| k)} such that 
all commuting density operators d>[5] are diagonal. In this case, 

®[S] = 2>)(*|<&[S]|*)(*| - X>)<fc|T¥SA4, 

k k 

where M^ = <£*[!&) (^11 are the components of an observable with a discrete set of 
outcomes X = {k}. 
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Sketch of the proof of Theorem 11.31. Let the channel have the form (11.48). Con¬ 
sider a state co e < B(3 (a <8> -Mr)- Then 

(3> <8» Idi?)(m) = j Sb(x) ®m w (dx), (11.49) 

X 


where 

m a} (B) = T Y A co(M(B) ® I R ), B c X. 

Any matrix element of the operator-valued measure m M (in a particular basis) is a 
complex-valued measure on X, absolutely continuous with respect to the probability 
measure ptw(B) = Tv m 0J ( B), B c X. The Radon-Nikodym theorem implies the 
representation 

ma>(B) = j 0 a>(x)lla>(dx), 

B 

where Oa>(x) is a function on X that takes values in <5(3(r). By using this represen¬ 
tation, we can rewrite (11.49) as 

(3> <8> Id/j)(m) = j S'(x) <S> o (0 (x)H( 0 (dx), (11.50) 

X 

which, by Proposition 11.27, is a separable state. 

Conversely, let $ be an entanglement-breaking channel. Fix a nondegenerate state 
oa in <S>{Xa) and let {| ey); j = 1,...} be the basis of the eigenvectors of oa with 
corresponding (positive) eigenvalues {Ay}. Consider the unit vector 


+oo 


i n> = Ev^->®i g ;> 


i =i 


in the space Ma <8> 31a- Then |£2)(£2| is a purification of the state a a- Since $ is 
entanglement-breaking, the state 


oab = ad^®$)[|S2)(S2|] 


(11.51) 


in <&(3£a <8> Mb) is separable. By (11.43), there exists a probability measure n on 
<5(3(a) x &(3(b) such that 


OAB 


/ / 

© Wa) ®(M b ) 


® Sr n{dSAdSs). 


(11.52) 
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This implies 


=IVs(Id^®$)(|S2)(S2|) 

Sa n(dS A dSa) 

S A 7t(dS A dS B ), (11.53) 

where the bar denotes complex conjugation in the basis {|e;)}. By this equality, for 
an arbitrary Borel set B c <S>(X B ), the operator 


-/ / 

<5{M A ) ®{X b ) 

- / / 


M(B ) = cr A 1/2 


// 

@(^) B 


S A n(dS A dSa ) 



(11.54) 


can be correctly defined as a bounded positive operator on X A such that M(B ) < 
M(X ) = I A . It is easy to check that M is an observable with values in X = 

Now, consider the entanglement-breaking channel 


4>(S) = J Satis (dS B ), 

®{x B ) 


and let us show that <^(5) = d>(5). For this it is sufficient to prove that 

®(\ei)(ej\) = &(\ei){ej\) 

for all i,j. However, 

H\ei){ej |)= J S B {ej\M(dS B )\ei) 

®(.Xb) 

= AT 1/2 a; 1/2 J J { ei\S A \ej)S B Jt(dS A dSB ) 

®(.X A ) ®{X b ) 

= A“ 1/2 AJ 1/2 Tr /1 (|^)(e I -| <g> I B )o AB = $(k,)(ey|), 

where in the last equality relation (11.51) was used. □ 

As was shown in Section 8.3.3, in finite dimensions, entanglement-breaking chan¬ 
nels form a large class, in which the additivity conjecture for the classical capacity 
holds. This fact can be generalized to the infinite-dimensional case. 
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Proposition 11.32. Let there be two channels 4>i, ®2 with the corresponding input 
constraints F\. F 2 , satisfying the conditions of Corollary 11.24, and let $1 be an 
entanglement-breaking channel. Then all additivity properties (8.36), (8.33), and 
(8.39) hold. In particular, the classical capacity of an entanglement-breaking chan¬ 
nel $ is equal to 

C(®, F,E) = C x (<l>, F,E). 

The proof of this statement is obtained by generalizing the argument to the finite¬ 
dimensional case (Proposition 8.19), with the replacement of sums with integrals 
and ensembles with generalized ensembles, taking into account that the conditions 
of Corollary 11.24 guarantee finiteness of all involved entropies. 

11.8 Notes and references 

1. The first attempt at a mathematically rigorous treatment of the basic notions of 
Quantum Mechanics in a separable Hilbert space was made in the classical treatise of 
von Neumann [212], In particular, the difference between Hermitian (symmetric) and 
self-adjoint operators, and the key role of self-adjointness plays in the spectral decom¬ 
position were stressed, in contrast with the preceding works by physicists. The other 
important circle of ideas is related to the notions of trace and trace-class operators. 
A modem treatment of these concepts, in connection with applications in quantum 
theory, is given in the books [171], [48], and [107]. 

Lemma 11.1 is due to Dell’Antonio [49], see also Appendix to the book of Davies [48]. 
The compactness criterion for subsets of quantum states is a modification of a result of 
Sarymsakov [175], called by him the “noncommutative Prokhorov’s Theorem”. The 
latter provides a criterion for weak compactness of families of probability measures 
on a metric space, see [164], It is used in Section 11.4. 

2. It is well known that the topological properties of entropy in the infinite¬ 
dimensional case differ sharply from those in finite dimensions. In the last case, 
the entropy is a bounded, continuous function on S(fK), while in infinite dimen¬ 
sions it is discontinuous everywhere and infinite “almost everywhere” in the sense 
that the set of states with finite entropy is a first-category subset of <Z(X). See the 
survey of Wehrl [215], where a number of useful properties of the entropy, includ¬ 
ing Theorem 11.6 and Lemma 11.8, are discussed. For recent results concerning the 
topological properties of the entropy in infinite dimensions, see Shirokov [183], 

3. The importance of considering input constraints for quantum channels was clear 
from the beginnings of quantum communication. For a detailed physical discussion 
of the problem of the determination of the capacity of an optical channel with con¬ 
strained energy of the input signal, see the survey [38]. The present study of the 
classical capacity of a constrained c-q channel is based on the works [98] and [102], 
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taking advantage of the random encoding (11.21), which is used in a similar classical 
problem. 

4. Channels with a continuous alphabet were considered in the works of Holevo [102] 
and Holevo and Shirokov [110], where the notion of a generalized ensemble was sys¬ 
tematically applied. Proofs of the used facts from the theory of probability measures 
on metric spaces, such as Exercise 11.12 and Lemma 11.13, can be found in the 
monograph of Parthasarathy [164], 

5. For the proof of the boundedness of the map $ in Exercise 11.17, see the book [48], 
Lemma 2.2.1. Our consideration of the /-capacity of constrained quantum channels 
is based on the works [102] and [110]. A detailed investigation of the properties 
of the /-capacity and other entropic characteristics of infinite-dimensional channels, 
without assuming finiteness of the output entropy, was made by Shirokov [185], [184], 

6. This section is based on the results of the work [102], 

7. The concept of a quantum observable as POVM is presented in detail in the 
books [107], [99], Theorem 11.31 is proved in the paper of Holevo, Shirokov, and 
Werner [111]. Concerning the proof of Proposition 11.32, see [106]. 



Chapter 12 

Gaussian systems 


The fundamental physical information carrier is the electromagnetic field as exempli¬ 
fied by light or radio waves. Mathematically, the radiation field is known to be equiv¬ 
alent to an ensemble of oscillators. In quantum optics, one considers the quantized 
field and hence quantum oscillators. This is a typical “continuous variable” bosonic 
quantum system, whose basic observables (oscillator amplitudes) satisfy the canon¬ 
ical commutation relations (CCR). Many of the current experimental realizations of 
quantum information processing are carried out in such systems. 

There is a class of particularly important states of bosonic systems which natu¬ 
rally correspond to classical multidimensional Gaussian distributions. From a physi¬ 
cal viewpoint, this class comprises thermal equilibrium states of light as well as co¬ 
herent and squeezed states produced by lasers and some “nonlinear” quantum optics 
devices. Mathematically, they are completely characterized by the mean and the co- 
variance matrix, in close parallel with the classical case. Restricting ourselves to a 
finite number of oscillator modes, which is the usual approximation in quantum op¬ 
tics, makes it possible to handle them with techniques from finite-dimensional linear 
algebra. 

Our central problems will be to unravel the structure of quantum Gaussian channels 
and to find their capacities. Both problems appear rather involved mathematically 
and are up to now only partially solved (especially as concerns the capacity prob¬ 
lem). Along with the partial results, we provide formulations of the basic conjectures, 
which, we hope, will stimulate attempts at their solutions. 

Unavoidably, a number of analytical complications related to infinite dimensional¬ 
ity and the unboundedness of operators arise in connection with bosonic systems and 
Gaussian states. In our treatment of CCR, we focus on the aspects essential to appli¬ 
cations, while a detailed presentation of the related mathematical tools can be found 
in the literature. For the convenience of nonspecialist readers, some mathematical and 
physical preliminaries, as well as motivation, are given in the introductory section. 

12.1 Preliminary material 

12.1.1 Spectral decomposition and Stone’s Theorem 


Let M be a separable Hilbert space and let { E(B)\ B e S(R)} be a spectral measure 
on R, where S(R) is the a-algebra of Borel subsets. According to Definition 11.29, 
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the spectral measure E defines a sharp real observable. As explained below, such 
observables are in one-to-one correspondence with self-adjoint operators in X. 

For an arbitrary unit vector rfr 6 X , the relation 


H+(B) = (f\E(B)\f) 


defines the Borel probability measure on R. If f{x ) is a Borel measurable function 
on R, the integral 

/ OO 

f(x)E(dx ) (12.1) 

-OO 

converges strongly on the dense domain 


S)(X f ) = £ : £ 


\f(x)\ H^(dx) 


< OO 


and thus, uniquely defines a linear operator on the domain D(Xf ) (strong conver¬ 
gence of operators {X n } to the operator X on the domain D means that 
lim„_>.oo \\X n f — Xf\\ = 0 for i fr € £). For the case where S) — X it is just 
the strong operator convergence.) 

Indeed, if f(x) = 1 B, (x) is a countably-valued function (here {B;} is a 

measurable decomposition of R), then 


OO 

Xf = '£f i E(B i ), 

i = t 


and due to the orthogonality of the vectors E(Bi)\f), 


OO OO 

\x f ^ 2 = Y^\fi\ 2 mEimt) = Y^\ft\ 2 02 . 2 ) 

i=1 i =1 

so that £)(Xf) is just the set of all vectors \xjr) for which this series converges. This 
definition can be extended to arbitrary Borel function /(x) by approximating it uni¬ 
formly with countably-valued functions and using (12.2). In particular, for the func¬ 
tion /(x) = x, we obtain the operator 


-i: 


X = / xE(dx) 


(12.3) 


with the domain 


for which 


£>(X) 


={*■■£ 


\x\ 2 fi^(dx) < oo 


/ OO poo 

xp$(dx); ||A^|| 2 = / |x| 2 p^(dx), \jr 

-oo «/—oo 


g £>(X). 


(12.4) 
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The operators obtained in this way are self-adjoint in the following sense. For any 
densely defined operator X, there is a unique adjoint X* satisfying 

{X*<p\f) = (<p\Xf), f g £>(X), <p g £>(X*), (12.5) 

where £>(X *) is the (dense) subspace of all vectors <p such that the right-hand side is 
a bounded linear functional of xfr . Operator X is called Hermitian (symmetric) if 

(X<P\f) = {cp\Xf), (12.6) 

for all (p,rfr G £>(X) and self-adjoint if X = X* in the sense that 3)(X*) = <D(X) 
and (12.6) holds. Often, one has to deal with essentially self-adjoint operators de¬ 
fined on some non-maximal domain which, however, extend uniquely to self-adjoint 
operators. 

The Spectral Theorem (see e.g. [171], Section VIII.3) says that for any self-adjoint 
operator X there exists a unique spectral measure £ on 1 for which the relations 
(12.3), (12.4) hold. Thus, self-adjoint operators are those which have the spectral 
decomposition (12.3) over the real line. Such operators are naturally associated with 
(sharp) real observables in Quantum Mechanics. In this case, the measure /Ay, is 
the probability distribution of the observable X in the pure state and the 

formulas (12.4) provide the first and the second moments of this distribution. 

In a certain sense, on which we will not comment here, the relation Xf = f(X) 
holds. By taking f(x) = exp (itx) in (12.1), we obtain the operators 

/ OO 

exp (itx)E(dx) = exp itX. 

-oo 

Then the family {V t ; t e M} is a strongly continuous group of unitary operators with 
the infinitesimal generator X: 

lim(iO -1 (K f - IM) = X\f); f g D(X). (12.7) 

Conversely, any strongly continuous group of unitary operators {V t ',t e M} has the 
form V t = exp itX, where I is a uniquely defined self-adjoint operator. These 
statements are the content of Stone’s Theorem. 

Exercise 12.1. 1. Let X —L 2 (R) be the Hilbert space of square integrable func¬ 
tions i/f(£), ? e and E = { E{B)\ B e S(R)} be defined by the expression 

= ’Ap¬ 

prove that E is the spectral measure of the operator q of multiplication by £, 
while V t is the operator of multiplication by exp it^ and, in general, Xf is the 
operator of multiplication by /(£)• 
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2. Consider the unitary operator in M defined by the Fourier transform 

(Ff) (A) = j exp(-iXI)f(l)dl. 

Then E ( B ) = F* E(B)F is a spectral measure on R, corresponding to the self- 
adjoint operator p = q = which generates the group of unitary operators 

Self-adjoint operators Xj ; j — 1,..., s commute if their spectral measures com¬ 
mute. The following is a multidimensional version of Stone’s Theorem. 

Theorem 12.2. Let V x ; x — [xi,..., x. s ] e R s be a strongly continuous group of 
unitary operators in a separable Hilbert space J(. Then there exists a family Xj ; j = 
1,..., s of commuting self-adjoint operators in 3t such that 

V x = exp x i x J^ ■ (12.8) 

Conversely, for any family Xj ; j - 1,...,s of commuting self-adjoint operators 
in 3t relation (12.8) defines a strongly continuous unitary group in 3t. 

12.1.2 Operators associated with the Heisenberg commutation relation 

In the Hilbert space 31 = L 2 (R) we consider the operators 

(9 V0(£) = (pfW = 

i d$ 

defined on a common dense domain 33 c . For example, 3) can be the subspace 
% (R) of infinitely differentiable, rapidly decreasing functions, with all derivatives 
tending to zero quicker than any degree of |f| when |£| -> oo. These operators 
are essentially self-adjoint. Hence, they represent (sharp) real observables (see Sec¬ 
tion 12.1.1). Here h is a positive constant equal to Planck’s constant in physical ap¬ 
plications, where it relates different systems of units (we will later choose the system 
in which h = 1). 

On this domain, q and p satisfy the Heisenberg commutation relation 

[q,p] = ihL (12.9) 

For a unit vector f from the common domain of q, p, consider a pure state \ 
and introduce notation 


x = {ty\q \f), y = {ty\p\ty), 

M<?) = 11(9 - X) n 2 ■ D*(p) - II(p - y) n 2 
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for the mean values and the variances of observables q, p. Then for any real u> 
co 2 Dijr(q) — coh + D ^(p) = co 2 D^(q) + iu>(f\[q, p]\f) + D ^(p) 


whence 


= \\[^(‘l-x) + i(p-y)]f\\ 2 >0, 


(p) > ^ , 


( 12 . 10 ) 


with the equality attained if and only if there exists a positive to such that 

[co (q-x) + i (p -y)\f = 0. 

This amounts to the differential equation 


(? — x) + 




t (?) = o, 


( 12 . 11 ) 


the normalized solution of which, up to a constant factor of modulus one, is 


f (?) = \i —r ex P 



l Z 

fl 


(t-i) 


a> (? ~ *) 
2 h 


( 12 . 12 ) 


Inequality (12.10) is the Heisenberg uncertainty relation and (12.12) are the mini¬ 
mal uncertainty state vectors, with D^(q) = D ^(p) = 

By introducing the complex combinations 


a — 1 — (coq + ip); ? = 


V2h 


CO 


Vih 


(oox + iy) 


a> 


and denoting the corresponding state vector (12.12) by | £), we can rewrite the defining 
equation (12.11) as 

«l?> =?!?>; fee. 


In particular, 

a|0) = 0, 

where |0) is the vector described by the function 


(12.13) 

(12.14) 


''rf exp 


oj ^ 2 " 1 
2 fi 


The operator a = 


/2hco 


(coq + ip) has the adjoint a 


t = 


commutation relation (12.9) can be rewritten as 

[a,a^] = /. 


n ho) 


(coq —ip). The 


(12.15) 
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Consider the operator 

JS = o'a = aa ^ — /, (12.16) 

which is essentially self-adjoint. By (12.14) it satisfies 

JV|0) = 0. 


Exercise 12.3. Successively applying relation (12.16), show that the vectors 

(12.17) 


|n) - VdO) 


Vn4 

form an orthonormal system of eigenvectors of the operator JV: 

Jf\n) = n|n); n = 0,1,... 

The corresponding functions in L 2 (M), up to normalizing factors, are 

col; 21 


7- * d 

n 

r oi^i 

co^-h— 

L d $\ 

exp 

2 h _ 


H n 



|t|exp 


2 fi 


where //„(•); n = 0,1,... are the Hermite polynomials, which form a complete 
orthogonal system in L 2 (M). Hence, (|n); n = 0,1,... } is an orthonormal basis in 
M, 

00 

Y i") ("i = 7 > 

71 = 0 


and JV has the spectral decomposition 


JV = n\n){n\. 


(12.18) 


71 = 0 


Exercise 12.4. Operator is self-adjoint of the type g' (Definition 11.3) with 
the domain 

<©GA0 = If : Y n2 K”l^)| 2 < 00 

l 71 = 0 

and (12.18) holds in the sense of strong convergence on <£)(JV). 

Further, the relations 


a\n) = -Jn\n — 1); a^n) = fn + \\n + 1) 
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hold. By introducing the group of unitary operators {e lJ ^ wt \t e R}, one has 

e i^(vt ae -iMcvt = ae~ icot \ e iMwt o'e~ i,Hwt - o'e iwt . (12.19) 

Coming back to the operators q, p, one sees that 

hco^V + ^J = ^(co 2 q 2 + p 2 ) =fiH (12.20) 

is the energy operator for the quantum harmonic oscillator with frequency co. Then ,N 
is the number (of quanta) operator, \n) are the number state vectors, |0) is the vacuum 
state vector, a (resp. a 1) is the annihilation (resp. creation) operator. In quantum 
optics the operators a, a^ describe a mode of the electromagnetic field corresponding 
to a definite frequency (and a definite polarization). 


12.1.3 Classical signal plus quantum noise 

The states |£)(£|;£ e C, are called coherent. From (12.17), (12.13), we obtain 

i t n t n ( m 2 \ 

W?) - - 7 =( 0 |fl B ?) = ^=(010 = 4= exp , 

Vn‘ Vn! Vn! \ / 

where we also used the formula 

(filfc) = exp[- ^(|ti| 2 + |£z| 2 -2fifc)]. (12.21) 


Exercise 12.5. Prove (12.21) by using the real parametrization (12.12) of the 
vectors |£). 

Exercise 12.6. The system of vectors ||£);£ e C} is overcomplete in the sense 

~ f I£><£1^ = /, 

where d 2 i; = ^ dxdy . Hint: the matrix elements of the integral in the basis 
{|n)| have the form 

i r >-n }m 

~ / 4=->=exp(-|t| 2 )^-^ 

n J Vn! Vm! 

Let /?(£) be a probability density on C. Then 

S = f I0(?l pWH 


( 12 . 22 ) 
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is a density operator of a state called classical in quantum optics. In particular, taking 
the complex Gaussian density with zero mean and variance N, we obtain the density 
operator 



(12.23) 


with the matrix elements 


{n\S 0 \m) 


1 r 

- / —f= ._ exp 

nN J vto ! 


ov + i)itr 

N 


dH = &n 


1 


A + 1 



Therefore, the operator So has the spectral decomposition 


So 


1 

N + 1 





(12.24) 


This relation also makes sense for N — 0 if So is defined by the Gaussian distribution 
that is degenerate at the point £ = 0. 

Taking into account the spectral decomposition (12.18) and relation (12.20), this 
can be written as 


exp [—OH] 

Tr exp [—OH] ’ 


(12.25) 


which is the Gibbs equilibrium ( thermal) state of the quantum harmonic oscillator 
at the inverse temperature 0/h = In . The parameter N = Tr Soa^a is 
interpreted as the mean number of the oscillator energy quanta. The von Neumann 
entropy of the state (12.24) 


1 00 / n \ n 

H(So) = ]v+T^(]rrT) [(» + !) lo g(^V + l)-» log ^V] =#), ( 12 . 26 ) 

where we introduced the function 


g(x) = (x + 1) log(x + 1) — x logx, x > 0; 

g(0) = 0. 


Another important density operator is obtained from the shifted probability density 

| 2 ' 




(12.27, 


where ji = -y== (a)m q + im p ). Introducing the unitary operators 


= ex P 


M) 




(12.28) 
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we see that 


It) - £(t)|0); 

Z)(t 2 )lti> =exp(«3?ife)|?i +? 2 ). 

That is, the action of this operator is to make a displacement in the mean values of 
q, p. Physically, such a displacement is realized by the action of an external source, 
such as an idealized laser producing a coherent state |t)(tl f rom the vacuum state 
|0) (0|. The quasiclassical state (12.22) is a chaotic mixture of coherent states, while 

S„ = D(p)S 0 D(p)* (12.29) 

represents the transformation of the equilibrium state under the action of an external 
source characterized by the complex amplitude p. Notice that the state S IL is pure 
if and only if N = 0, in which case it coincides with the coherent state \p){p\. In 
applications to information theory, p is the classical signal, to be transmitted via the 
noisy quantum mode q, p. The state S /t thus is a model for a classical signal plus 
quantum Gaussian noise (see the next section). 

Exercise 12.7. Prove that the displacement operators satisfy the commutation 
relation 

0(fe)0(?i) = exp(/3£i£ 2 )Z)(ti + fe), (12.30) 

which implies 

D(p)* = exp(2 i2sptf)D(£). (12.31) 

In what follows, we need the noncommutative analog of the characteristic function 
for the state S /t : 


Tr S^Dtf) = exp 


2 i%p% — 



Proof of formula (12.32). By (12.29), (12.31) we have 

Tr S^Dtf) = exp(2i3pi;) Tr S 0 D(i;) 
and it is sufficient to prove (12.32) for p = 0. Now, 


(12.32) 


TrS 0 Dtt) = f {^\Dmi)^exp(- l -^pjd 2 ^ (12.33) 

= J <0|Z)(£i)*Z)(t)£>(ti)|0)-^exp 
= J exp(/3£i£)(0|Z)(£)|0)-^exp 
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By using (12.21), we obtain 

<0|Z>(t)|0) = <0|f) = exp 

Substituting this expression into (12.33) and using the expression for the characteristic 
function of the Gaussian probability distribution, we obtain 

Tr S 0 D(l;) = exp |?| 2 ) . 

□ 


12.1.4 The classical-quantum Gaussian channel 

The mapping jx can be considered as a classical-quantum (c-q) channel in the 

sense of Section 11.4, realizing the transmission of the classical signal fi e C with 
the additive quantum Gaussian noise of power N . For the first two moments, we have 

Tr S^a = jx, Tr S^a = N + \/x\ 2 . (12.34) 


To compute its classical capacity, we can use the general approach of Section 8.2. 

Assuming that the words w = (fi i,..., fi n ) are transmitted by independent uses of 
this channel (memoryless channel), we impose the additive energy constraint on the 
signal /x of type (11.16), 

iMil 2 + ••• + Im»| 2 < nE, (12.35) 


which corresponds to the constraint function f(jx) — \ix\ 2 . In this case, the conditions 
of Theorem 11.15 are satisfied with the choice F = a^a = ,M, and the classical 
capacity of the channel ix with the energy input constraint (12.35) is equal to 


C = max 
netPE 



HiS^Jtid 2 ^) 


(12.36) 


where Pe is the set of input distributions Jt(d 2 fx) satisfying 

j \ix\ 2 jt(d 2 ix) < E, (12.37) 

and 

Sn= j S tl Tt(d 2 ^). 

Note that, by (12.29), the states S /t are unitarily equivalent and hence have the same 
entropy 77(5 M ) = H(Sq ) = g(N), see (12.26). Therefore, for any input distribution 
n(d 2 ix), the ^-quantity (11.24) is equal to 


XW = H(S n ) - H(S 0 ), 
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and the problem reduces to maximization of the entropy H(S n ) under the con¬ 
straint (12.37). Due to (12.34), relation (12.37) implies 

TxS n a'a <N + E. (12.38) 


This is a restriction onto the second moments of the state S n . According to the maxi¬ 
mal entropy principle (see Lemma 12.25 below), the maximal value of the entropy 

H{S n ) = g(N + E ), (12.39) 


is attained by the Gaussian density operator 


Sn = 


1 


N + E + 




(12.40) 


which fullfils the condition (12.38) with equality sign. It corresponds to the optimal 
distribution 



Finally, the capacity of the memoryless c-q Gaussian channel is given by the ex¬ 
pression 

C = C x — g(N + E) — g(N) (12.42) 

=(i + + {N + £)log (‘ + *rh) _ N i° 6 (‘ + it) • 

which behaves as log ^1 + asymptotically in the limit N-> oo, E/N-> const. 

Thus, relation (12.42) can be regarded as the quantum generalization of Shannon’s 
formula (4.51), with Gaussian white noise of power N. The factor 1/2 is absent 
in (12.42), because one oscillator mode corresponds to the two independent identically 
distributed real amplitudes m q ,m p . 

In what follows, we will develop a general theory of quantum Gaussian channels 
and their capacities. 


12.2 Canonical commutation relations 

12.2.1 Weyl-Segal CCR 

In quantum mechanics, the canonical commutation relations (CCR) arise in the quan¬ 
tization of a mechanical system with a finite number of degrees of freedom, or a 
classical field represented as an infinite collection of oscillators. In quantum optics, 
one usually deals only with a finite number of relevant oscillator frequencies, thus 
again reducing to the case of mechanical system with a finite number 5 of degrees of 
freedom. 
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Consider the Hilbert space 3t — L 2 (M 5 ) of complex, square-integrable functions 
of the real variables i-j\j = 1,..., s, where M 5 is the coordinate space of the un¬ 
derlying classical system. In the space Jf, we consider the two groups of unitary 
operators 

VxfG) = expO'£ T x)i/r(£); U y f{%) = + y), (12.43) 

where £, x, y € IR 5 are understood as column vectors and T denotes transposition. 
These groups satisfy the Weyl CCR 

U y V x = exp(iy r x)V x Uy. (12.44) 


Notice the analogy with the discrete groups arising from (6.51). The groups U y , V x 
describe the change of a quantum state under the displacements in position (respec¬ 
tively, momentum) space, and the Weyl CCR is an expression for the kinematics of 
a nonrelativistic quantum mechanical system, which in fact can be derived from the 
Galilei covariance, see e.g. [107], 

Computing the generators of the groups V x , U y by a multidimensional analog of 
formula (12.7), we obtain a particular instance of Stone’s Theorem 


V x = exp ^ x j9jj , U y = exp ^ yj pjj , 
where x = [xi ... x. v ] T , y = [y x ... y 5 ] T and the self-adjoint operators 

fc 1 3 

4j=$j> pj = jjr 
1 

are the canonical observables which satisfy the Heisenberg CCR 


kj . Pk\ = i 8jk I, [qj , qk\ = 0, [pj, p k ] = 0. (12.45) 


Here and in what follows, we choose units in which fi = 1. The operators qj , pj are 
unbounded. Hence, relations (12.45) should be considered only on a common dense 
domain, such as the subspace % (R). 

To make use of the apparent symmetry between x, y, we introduce the 2s -vector 
z = [xi, >’i,..., x 5 , v 5 ] t and the unitary Weyl operators 


W(z) = exp 


(H 


V X Uy. 


(12.46) 


From (12.46) and (12.44) it follows that the operators W(z) satisfy the WeylSegal 
CCR 


W(z)W(z') = exp 


-A(z, z') 


W(z + z% 


(12.47) 



278 


V Infinite systems 


where 

s 

a(z, z r ) = J2( x jyj ~ x j yj) = zTaz ' ( 12 - 48 ) 

y=i 


is the canonical symplectic form. Here, we denote by the same letter the matrix of the 
form 


0 1 

-1 0 


0 1 

-1 0 


= diag 


0 1 

-1 0 


(12.49) 


From (12.47) and the unitarity of the operators W(z) it follows that W(—z)=W(z)*. 
Relation (12.47) implies 


W(z')*W{z)W(z r ) = exp[—i A(z, z')\ W(z). (12.50) 


The similarity between the commutation relations (12.30) and (12.47) is not acciden¬ 
tal. In fact, the operators W{— A _1 z) are the multimode generalization of the unitary 
displacement operators. 

Exercise 12.8. Show that in the case of one degree of freedom 

D(ji) = W(m p /h, —m q /fi), (12.51) 

where u = X— (com a + im„), so that 
V2 fico K H F 

D(p)*W(z)D(ix ) = expi(m q x + m p y)W(z). (12.52) 


Since A(z, z) = 0, relation (12.47) implies that for any fixed z the one-parameter 
family {W(tz)\t e R} is a unitary group. The generator of this group, computed 
according to the formula (12.7), is the self-adjoint operator 

S 

Rz = + yjPj)> 

7 = 1 


where 


R = [q\ pi ■■■ q s ps] 


is the row vector of the canonical observables. According to Stone’s Theorem, 


W(z) = exp i Rz. 


(12.53) 
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Exercise 12.9. Use (12.50) to prove the relation 

W{-/±- l z)*RW(-&- l z) = R + z T I. (12.54) 

Exercise 12.10. Show that the Heisenberg commutation relations (12.45) can be 
written in the form 

[Rz,Rz’] = iA(z,z’)I. (12.55) 

12.2.2 The symplectic space 

The space Z = M 2 * equipped with the nondegenerate skew-symmetric form A(z, z'), 
is a symplectic vector space. It represents the phase space of the classical system, the 
quantum version of which is described by the family of unitary operators W(z) in 
the Hilbert space X. This quantization is essentially unique. Any irreducible family 
of unitary operators W{z) in a Hilbert space satisfying Weyl-Segal CCR is unitarily 
equivalent to the representation in L 2 (M ,y ) described above, which is called the Schro- 
dinger representation (Stone-von Neumann’s Uniqueness Theorem, see e.g. [171], 
[107]). In what follows, W(z);z e Z can be any irreducible representation of the 
CCR. 

A basis { ej , hj ; j = 1, .v} in Z is called symplectic if 

A(ej,h k ) = 8 jk , A(ej,e k ) = A(hj,h k ) = 0; j,k = l,...,s. (12.56) 

In any such basis, the symplectic form A(z, z') has the standard expression (12.48). 
The transition matrix T from the initial symplectic basis in Z to the new symplectic 
basis is a matrix of symplectic transformation in (Z, A), which is characterized by 
the property 

A(Tz,Tz’) = A(z,z'); z, z' e Z. 

The operator J in (Z, A) is called an operator of complex structure if 

J 2 = -1, (12.57) 

where 1 is the identity operator in Z, and it is A-positive in the sense that the bilinear 
form 

j(z,z') = A(z,Jz r ) (12.58) 

is an inner product in Z. Note that A-positivity is equivalent to the conditions 

AJ — —J T A, AJ > 0. 


(12.59) 
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Exercise 12.11. Prove the following statement: operator J defines the structure 
of a complex unitary space in Z (of dimensionality s), in which iz = Jz and the 
inner product is 


J j(z, z') + i A(z, z') = A(z, Jz') + i A(z, z'). 

In what follows, we will consider various bilinear forms a,... in the space 
Z = M 2,y , and the matrices of such forms will be denoted by the same letters, e.g. 
a(z , z') = z T az', etc. 

Lemma 12.12. Let a (z, z') — z T az' be an inner product in the symplectic space 
(Z, A). Then there is a symplectic basis {ej,hj\ j — 1,..., v} in Z such that the 
form a is diagonal, with the matrix 


a = diag 


0 


0 

Ctj 


( 12 . 60 ) 


where aj > 0. 

Proof. Consider the operator A = A -1 a, satisfying 

a{z,z ! ) = A(z, ^Iz')- 

The operator A is skew-symmetric in the Euclidean space ( Z,a) : A* = — A. Ac¬ 
cording to a theorem from linear algebra, there is an orthogonal basis {ej , hj} in 
(Z, a) and positive numbers {ay}, such that 

Aej = otjhj\ Ahj = -otjej. 

Choosing the normalization a(ej,ej) = a(hj,hj) = ay produces the symplectic 
basis in (Z, A), with the required properties. □ 

For arbitrary inner product a in Z, there is at least one operator of complex structure 
J, commuting with the operator A — A -1 a, namely, the orthogonal operator J from 
the polar decomposition 

A = \A\J = J\A\ (12.61) 

in the Euclidean space (Z,a). Applying Lemma 12.12, we obtain that there is a 
symplectic basis {ej,hj\ j = 1 ,... .s} in which the form a is diagonal with the 
matrix (12.60), while J has the matrix 


J = diag 


0 -1 

1 0 


( 12 . 62 ) 


-ej. 


so that 


Jej = hj, Jhj = 
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Denoting by T the transition matrix from the initial symplectic basis in Z to the new 
symplectic basis, i.e. the matrix with the columns {ej, hj ; j = 1,..., .v}, we have 

A = T t AT] a = T t aT ; J = T~ l JT. 

The canonical observables in the new basis are given by the relations 

9 j=Rej, pj = Rhj\ j = l,...,s, 

so that RT = R, where R = [q\ p\ ■ ■ ■ q s Ps\■ The complex structure is most simply 
expressed in terms of the creation - annihilation operators 

~ a ) = ^ fe' - iPj) - “j = ^ (qj + iPj) ■ 

Namely, the action of the operator J on the canonical observables R RJ is ex¬ 
pressed as the multiplication 

"t" "t" 

dj ->iaj, aj —/ay; j = 1,...,j. (12.63) 

This follows from the fact that [qj pj\J = [pj — qj]. 

12.2.3 Dynamics, quadratic operators and gauge transformations 

For any symplectic transformation T in Z, there is a unitary operator LJj in X such 
that 

UfW{z)U T = W(T z). (12.64) 

This follows from Stone-von Neumann’s Uniqueness Theorem because the operators 
W(Tz);z e Z, again form an (irreducible) representation of the CCR (12.47) in 
X. In view of relation (12.53), this is equivalently expressed in terms of canonical 
observables 

UjRUt = RT. (12.65) 

Let us consider one-parameter semigroups of symplectic transformations in Z 

7\ =e fD ; feR, 

that describe linear dynamics in the classical phase space. We assume that the gener¬ 
ator D is A-positive, i.e. A(z, Dz') is an inner product in Z. In matrix notations, 

AD = -D t A > 0. (12.66) 

The quantization of the linear dynamics T t is given by the following Theorem. 
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Theorem 12.13. Consider the operator H = ReR T in X, which is quadratic in 
the canonical variables R, where € — — ^DA _1 . Then H is a positive self-adjoint 
operator in X generating the unitary group {e ltH } such that 

W(T t z) = e itH W(z)c ~ itH . (12.67) 

Proof (sketch). In terms of generators of the Weyl unitaries, the relation (12.67) 
that we wish to establish is equivalent to 

Re tD = e itReRT Re~ itReRT , (12.68) 


with D = —2eA. 

Consider the bilinear form 


X -A(z,Dz') 


—z T AeAz' 


which, by our assumption about the operator D, is an inner product. According to 
Lemma 12.12, there is a symplectic basis {ej , hj } such that the matrix of this form is 
diagonal 


—AeA = diag 


a>j/2 0 

0 coj /2 


where (Oj > 0. Hence, 


e = diag 


(Oj /2 
0 


0 

(oj/ 2 


By introducing the new canonical observables qj = Rej,pj = Rhj,j = l,...,j, 
we have 

H = jZ y fe ' 2 + pj 2 ) ■ < 12 - 69 ) 

7 = 1 


implying that H is a positive self-adjoint operator as a sum of operators of the 
type (12.20) referring to different modes. In terms of the creation-annihilation op¬ 
erators, 

S 

H = (Oj{a)aj + 1/2). (12.70) 

7 = 1 


In the new basis {ej, hj }, the matrix of operator D = — 2e A has the form 


D = diag 


0 

(°j 



= J 


(Oj 0 

0 coj 


where the last expression is just the polar decomposition of D , so that the complex 
structure J = T JT arises from the similar decomposition 


D = J\D\ = \D\J. 


(12.71) 
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Consider formula (12.68) that we wish to establish in the new basis. Taking into 
account the second relation in (12.63), which shows that the right action of J on R is 
equivalent to multiplication of the annihilation operators by — i, this formula reduces 
to equations of the type (12.19): 




t; 


a,e 


icojt _ a>j tajcij - . 0 —icoj ta)a 


a,e 


1 ,. 


(12.72) 


describing the dynamics of quantum harmonic oscillators with frequencies coj. □ 

The components of the quantum system described by the operators dj , fit or qj, pj , 
corresponding to the frequencies a)j, are called normal modes, and the approach based 
on the representation of the energy matrix € in the diagonal form is just the normal 
mode decomposition. 

With every complex structure, we can associate the cyclic one-parameter group 
{e vJ ;<p e [0, 2jt]} of symplectic transformations, which we call the gauge group. 
According to the above Theorem, the gauge group in Z induces the unitary group of 
gauge transformations in X by the formula 

W^z) = e~ i(/>G W(z)e i(/>G , (12.73) 

where G = ^RJA -1 R T is a. positive self-adjoint operator in X. In terms of genera¬ 
tors, (12.73) reduces to the canonical transformation 

RefJ = e -i<pG Re i<pG ( 12 . 74 ) 


which is a particular case of (12.68). 

Applying the previous argument to the case where D = J and using (12.57), we 
obtain that there exists a symplectic basis {ej , hj ; j = 1,..., .v} such that Jej = 
hj, Jhj — —ej, in which 

G = E^fe 2 + ^ 2 ) =: ^ + ^ / ’ < 12 - 75 ) 

7 = 1 

where 


S 

w - j2 s J a J 

7 = 1 

is the total number operator. 

An operator X in X is called gauge-invariant if 

e~ i(pG Xe i(pG = X 

for all <p e [0, 2jt]. By using (12.74) and (12.59), we find that a quadratic operator 
X = ReR T , where e is a symmetric positive matrix, is gauge-invariant if 


[7,€A] =0, 
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i.e. J is the operator of complex structure from the polar decomposition (12.71) of 
D = —2<? A. During the proof of Theorem 12.13 we established that such a complex 
structure exists for every energy matrix e. 

I Exercise 12.14. Let cuy; j = 1,..., s be positive numbers, and 


H = J2 + p j 2 ) 


7 = 1 


be the Hamiltonian of the ensemble of oscillators with the frequencies coj , then 
<7/ = JuJqj’Pj = pj, so that 



-j=(cojqj -ipj), 
v 


dj 


- 7 ==(°>jqj + 'Pi) 


and 

H = + PJ 2 ) = Y2 ( d j a J + 1 / 2 )- 

7=1 7=1 

The corresponding operator of complex structure is 


J = diag 



-COj 


0 


while the gauge operator G in is given by (12.75). 


12.3 Gaussian states 

12.3.1 Characteristic function 

The characteristic function of a quantum state S is defined as 

(p s (z) = TrSJT(z); z e Z. 

This is a kind of noncommutative Fourier transform, uniquely defining the operator 
S. The corresponding inversion formula is 

s = (df S* s{z)m ~ z)dl ‘ ! - 

where d 2s z = dx\ ... dy s is the element of symplectic volume in Z. Similar rela¬ 
tions hold for an arbitrary trace class operator S, see [107]. 

Positivity of the operator S implies (and is in fact equivalent to) the nonnegative 
definiteness of n x ^-matrices: 

<t>(z r -z s ) exp f l -A(z r ,z s ) 


r.s = 1 . 


> 0 


(12.76) 
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for all n = 1,2,... and arbitrary collections {zj,..., z„} e Z. To see the necessity 
of this condition, notice that the matrix element is nothing but Tr W(z s )*SW(z r ) so 
that 

Y^c r c s (t>(z r -z s ) exp^A(z r ,Zj)^ = Tr5,4,4*, 
where A — c r W(z r ). 

A density operator S has finite second moments if Tr 5 (q 2 + pj) < oo for all 
j, where the trace is defined as in (11.6). In this case, one can define the mean 
vector Tr SR = m and the covariance matrix B s(R) — a (see Section 2.3.3). By 
CCR (12.55) the commutation matrix C s{R) = — A, so that 

a + ^A = Tr (R - m) T S(R - m). (12.77) 

This relation (and its transpose) implies the inequality 

a > ± l -A, (12.78) 

which is nothing but the uncertainty relation (2.20) for the canonical observables R. 
In Theorem 12.17 below we show that condition (12.78) is not only necessary, but 
also sufficient for a to be the covariance matrix of a quantum state. We denote by 
£ (m, a) the set of all states with the mean vector m and the covariance matrix a. 

As in probability theory, the components of m,a (as well as higher moments) can 
be expressed via the derivatives of the characteristic function. 

Exercise 12.15. Assuming the existence of the corresponding moments, show 
that 

Tt S(Rz) n =r n ~d>(tz) . 

ai t —o 

12.3.2 Definition and properties of Gaussian states 

The characteristic function (12.32) of the one-mode density operator (12.27), ex¬ 
pressed in real variables, using (12.51), is 

TrSVK(z) = exp^'(m 9 jc + m p y) - j(x 2 + y 2 )j , (12.79) 

where a = N + j > j. We call such a state S elementary Gaussian state. Let us 
now elaborate the general definition of a multimode Gaussian state. 

The state S is called Gaussian, if its characteristic function 4>(z) = Tr.S’ W(z) has 
the form 



286 


V Infinite systems 


where m(z) = mz is a linear form and a(z, z) = z T az is a bilinear form on Z, A. 
Here, m is a row 25 , -vector and a is a real symmetric (2s) x (2s)-matrix. 

I Exercise 12.16. Considering the derivatives of (12.80) at z = 0, show that m is 
indeed the mean vector, and a is the covariance matrix determined from (12.77). 

Usually, ( m,a ) are called the parameters of the Gaussian state. In the case m = 0 
we call the Gaussian state centered. If S m is the Gaussian state with parameters 
( m,a ), then 

S m = H'(-A“ 1 m T ) 1 S 0 W'(-A -1 in T )*, (12.81) 

as follows from (12.54). By this unitary equivalence many questions reduce to the 
consideration of centered states. 


Theorem 12.17. For relation (12.80) to define a quantum state, it is necessary and 
sufficient that the matrix a satisfies condition (12.78). Under this condition, for¬ 
mula (12.80) defines the unique Gaussian state in E ( m,a ). 

Proof. It was already mentioned that the necessity follows from relation (12.77). An 
alternative proof follows from the next lemma, which will also be needed in the sequel. 


Lemma 12.18. Let a be an inner product, A a skew-symmetric form in Z, such that 
for alln — 1,2,... and arbitrary collection {z\,... ,z n \ € Z, 

exp ^ a(z r ,z s ) + l -A(z r ,z s ) 

In this case, the matrices of forms a, A satisfy condition (12.78). 


> 0 (12.82) 

r.s= 1....« 


Proof. Let t > 0. By writing the condition of nonnegative definiteness for the 
collection {0, yftz\, ..., \ftz n } with the variables {c 0 , c\ ,..., c n } e C such that 
c 0 = - YTj=\ c j, w e have 



a{z r ,z s ) + 


A (z r , z^) 



> 0 . 


Dividing by t and letting f —> 0 we obtain that the matrices with elements a(z r ,z s ) + 
|A (z r ,z s ) are nonnegative definite for an arbitrary collection z\,... ,z n , which is 
equivalent to condition (12.78). This proves the lemma. □ 


Substituting the Gaussian expression (12.80) into (12.76), one can see that 


2 A (z r ,z s ) 


= <t>(z r )<t>(z s ) exp{a(z r , Z s ) + -A(z r ,z s )}. 


<t>(z r -z s ) exp 

Thus, the condition of Lemma 12.18 is satisfied, implying inequality (12.78). 
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To prove sufficiency, assume that the matrix a satisfies inequality (12.78). 
Lemma 12.12 then implies that there exists a symplectic transformation T such that 


a = T ' aT = diag 


ay 0 
0 a,- 


where ay > 0, while inequality (12.78) is easily seen to be equivalent to 


ay > ^, y = 


(12.83) 


(12.84) 


Relation (12.83) implies 


<p(Tz) = exp (imz — -z az) 


s 1 
= ex P + mj p yj) - + yj)]. 


7 = 1 


where m -- mT. 

Denoting by S ^ the corresponding elementary one-mode Gaussian states (12.79) 
and putting 

S 

S = 0S°' ) , (12.85) 

7 = 1 

we obtain 

<p(Tz) = Tr S exp i Rz = Tr S W(Tz), 

where R = RT are the new canonical observables. It follows that the state S has the 
characteristic function <p(z). □ 

In quantum optics, the representation of the Gaussian state in the form (12.85) is 
usually related to the normal mode decomposition. 

Let J be an operator of complex structure in Z and {e l<pG } be the corresponding 
gauge group in X. From (12.73) it follows that the Gaussian density operator S 
is gauge-invariant, e~ l<pG S e l<pG = S,<p e [0,27r], if and only if its characteristic 
function satisfies the condition <f)(c vJ z) = <p(z), which is equivalent to m = 0 and 
J T a + aJ = 0. By using (12.59), the last equality can be written as 

[J, 21 ] = 0, (12.86) 

where A — A -1 a, which means that J is the complex structure from the polar de¬ 
composition (12.61), i.e. J — T JT~\ where J is defined by relation (12.62). 

Notice that the elementary Gaussian state (12.79) is pure if and only if TV = 0, 
i.e. a = 1/2. Using the decomposition in (12.85) this implies several equivalent 
conditions for purity of a general Gaussian state. 
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Exercise 12.19. A Gaussian state S is pure if and only if one of the following 
equivalent conditions holds: 

i. aj = 1/2; j = l,...,s 

ii. |det(2A)| = 1 

iii. the operator A = A -1 a satisfies 

A 2 = (12.87) 

4 

iv. a = i AJ, where J is an operator of complex structure 

v. a is a minimal solution of the inequality (12.78) 

Hint: It is sufficient to check conditions ii.-iv. in the symplectic basis from 
Lemma 12.12. 

Under these conditions, the set £ (m, a) consists of one pure Gaussian state. 

Let S be a centered Gaussian state with the covariance matrix a, let J be an opera¬ 
tor of complex structure from the polar decomposition (12.61), and let Sq be the pure 
centered Gaussian state with the covariance matrix j AJ. Thus, both these states are 
gauge-invariant under the unitary group {exp(;<pG)}, corresponding to this complex 
structure. 

Exercise 12.20. By using the normal mode decomposition (12.85), prove the 
following multimode generalization of formula (12.23) 

S = j W(z)S 0 W(z)*P(d 2s z), (12.88) 

where P is the Gaussian probability distribution with the characteristic function 

— w T (a — - AJ 

V 2 

[ This distribution is invariant under the complex structure J in Z. 

The gauge-invariant pure state So plays the role of the vacuum state while 
W(— A _1 z)SolT(— A -1 z)* are the coherent states with respect to the complex struc¬ 
ture J. Note that a state that is “squeezed” relative to a given complex structure can 
always be made “coherent” by using the complex structure associated with its covari¬ 
ance matrix. In physics, the distinguished complex structure is one with which the 
oscillator Hamiltonian has the canonical form (12.69) and coherent/squeezed states 
are defined relative to that complex structure. 


j e iMw,z) p(d i Sz) = ex p 
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12.3.3 The density operator of Gaussian state 

The spectral decomposition of an arbitrary Gaussian state (12.85) can be obtained 
by tensor multiplication of the spectral decompositions of the one-mode states S^ 
from the normal mode decomposition. We will prove a theorem that generalizes rela¬ 
tion (12.25) to arbitrary Gaussian states without a pure component. 

Lemma 12.21. The Gaussian density operator S is nondegenerate if and only if 



Proof The Gaussian density operator S is nondegenerate if and only if it has no pure 
component, i.e. ctj > 1 ; j — 1,..., s. We will prove the lemma by establishing the 
identities 

det^a + ^A^= det^ 2 -i/^ 2 = II (“j ~ > (12.89) 

where A = A -1 a. Choosing the symplectic basis as in Lemma 12.12, we have 
A = T(A~ 1 a)T~ 1 , where 

A _1 a = diag ° ~“ J ' . (12.90) 

Hence, the numbers ±ia/; j = 1,..., s are the eigenvalues of the operator A. It fol¬ 
lows that the symmetric operator — A 2 — \l in the Euclidean space (Z, a) is positive, 
since its eigenvalues are aj —1/4 > 0 (notice that, by (12.87), this operator is equal to 
0 if and only if the state S is pure.). Each of these eigenvalues appears twice. Hence, 
the second identity in (12.89) follows. 

To prove the first identity, note that 

—(A -1 a) 2 - ^/ = -A -1 + ~A^j A -1 - l -A^j . 

Taking into account that det ( ± A -1 ) = 1 and det (a + §A) = det (a — | A) > 0 
we obtain the result. □ 

Theorem 12.22. Let the matrix a + ^ A be nondegenerate. In this case, the centered 
Gaussian density operator with covariance matrix a has the form 

So = cexp (12.91) 

where 

c = det (a + -a"| ~ = [det (—■4 sin 2 e A)] 3 , 


(12.92) 
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and e is found from the relation 


2A 1 a = cot eA. 


(12.93) 


Proof Rewriting the relation (12.25) (for oj = 1) in the form 
(e 9/2 - e^ 2 ) exp („ 2 + p 2 )) , 

using the normal modes decomposition (12.83) diagonalizing a and (12.85), we have 

Sq = cexp ReR T ^j, 


where 

e=fl(e ^ 2 

7 = 1 

and Oj = In ( aj+ -f j, so that 
\ “7 2 / 

e 0 7/2 _ e -0j/2 = (af - ^ ^ 2 . (12.94) 

Taking into account (12.89), the first of the expressions (12.92) follows. Returning to 
the initial canonical observables R, we obtain (12.91), where e = TeT J . 

Inverting the formula (12.94) produces 

1 Oj 

a j = - coth^-. (12.95) 


e 9j ^ , € = diag 


Oj /2 0 

. 0 Oj/h 


Further, 


eA = diag 


0 

~0j/ 2 


Oj/2 

0 


Note that eA = TeA.T~ l and A _1 a = TA.~ l aT~ l are the matrices of opera¬ 
tors, with A _1 a given by the relation (12.90). The operators — eA and A _1 a have 
the same eigenvectors, with eigenvalues ±i0j/2 and ±iaj. Therefore, via relation 
cot(/A) = —i cothA, (12.95) implies (12.93). □ 


12.3.4 Entropy of a Gaussian state 

To compute the von Neumann entropy of a general Gaussian state, one can use the nor¬ 
mal mode decomposition (12.85). By (12.81), the entropy is the same for all Gaussian 
states with arbitrary mean and covariance matrix a, reducing the problem to the case 
of a centered state. For a single mode (s = 1), the density operator j s unitarily 
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equivalent to the state (12.24), with entropy equal to g(N). For a general Gaussian 
state we obtain, by summing over normal modes, 

H ( S ) = J 2 8 ( n j )’ (12.96) 

7 = 1 


where Nj = aj — 1/2 > 0. 

To write this in coordinate-free form, recall that the operator A = A~ l a has eigen¬ 
values ±iaj. Hence, its matrix is diagonalizable (in the complex domain). For any 
diagonalizable matrix M = C/diag(nz ; )(/ _1 , we set abs(M) = U diag(|m 7 - |)t/ _1 , 
similar to other continuous functions on the complex plane. Now, equation (12.96) 
can be written as 

H(S) = ^Sp g (abs(A- x a) - 0 , (12.97) 

where Sp is used to denote the trace of a matrix, as distinct from the trace Tr of 
operators in the underlying Hilbert space. 

An alternative expression for the entropy of a Gaussian state can be obtained from 
the representation (12.91). By definition, we have 

H(S) = — logc + Spea, 
which, by (12.92) and (12.93), is equal to 

-(A ~ x a) 2 -\l 

+ Sp(A -1 a)arccot(2A _1 a) = -0plog(— sin 2 eA) + 0p (eA cot eA). 

(12.98) 


-Sp log 


Exercise 12.23. Use the relation 

to show that (12.97) and (12.98) are the same. 

Example 12.24. In the case of a one-mode Gaussian density operator, we have a = 
with a qq CL pp — (a qp ) 2 > ^ (this inequality is equivalent to condi¬ 
tion (12.78)). In this case. 


a qq 

a qp 

a qp 

a pp 


—( A " 1 *) 2 


[a qq a pp - (a qp ) 2 } 


1 0 
0 1 
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Hence, 

|A _ 1 a| = ^ja qq aPP — (a q P) 2 

and 

H(S) = g a qq a pp — {a qp ) 2 — - 

Geometrically, the quantity n ^/a qq aPP — (a qp ) 2 is equal to the area of the deviation 
ellipsoid 

z T a -1 z = 1, z = [x,y] T , 

for the two-dimensional Gaussian distribution with covariance matrix a. 

Note that, in accordance with (12.87), the Gaussian state is pure if and only if it has 
the minimal uncertainty a qq a pp — (a qp ) 2 = This comprises both coherent states 
for which 

a qq = u pp = I oflP = 0 

and squeezed states for which this condition is violated (as distinct from quantum 
optics we use here real rather than complex parametrization). The case of many modes 
can be treated by using the normal mode decomposition. 

While pure Gaussian states are the minimal uncertainty states for the canonical 
observables, a general Gaussian state is characterized by the following property of 
maximal entropic uncertainty under fixed moments of the canonical observables. 

Lemma 12.25. The Gaussian state has the largest entropy among all states with given 
mean m and correlation matrix a. Therefore, for any set of states defined by restric¬ 
tions on the first and second moments, the maximum of the entropy can always be 
sought among the Gaussian states. 

Proof Let S € E (m, a) and let S be the unique Gaussian state in £ (m,a). Without 
loss of generality we may assume that S is nondegenerate. The general case can be 
reduced to this one by separating the pure component in the tensor product decompo¬ 
sition of S. Indeed, by performing the normal modes decomposition, we can assume 
that Z = Z\ ® Z 2 with A = Ai ® A 2 ,a = cl\ ® ot 2 (see the next section for 
the direct sum decompositions), where cl\ corresponds to the unique, necessarily pure 
Gaussian state in Z \, while ai corresponds to a nondegenerate density operator. We 
have 

H(S) - H(S) = H (S; S) + Tr (S - S) log S, 

where the last term is equal to zero, because, by (12.91), for a nondegenerate Gaussian 
S, the operator log S is a second order polynomial in the canonical variables, while the 
first and second moments of S, S coincide. Thus, H(S)—H(S) = H (5; S) > 0. □ 


1 0 
0 1 



Chapter 12 Gaussian systems 


293 


A similar result holds for the conditional quantum entropy [55], In its formulation 
and proof, we use the description of composite bosonic systems that we introduced in 
the next section. 

Lemma 12.26. The Gaussian state has the largest conditional entropy among all 
states of a composite bosonic system AB with the given mean m and the correlation 
matrix a. 

Proof. Let Sab € £ (mab^ab) and let Sab be the unique Gaussian state in 
£ (mAB, a AB)- Then Sa e £ (m^.a,*) and Sa is the Gaussian state in £ (m^,a^). 
Again, we may assume that Sab is nondegenerate. Denoting 

H(A\B) = H(Sab) - H(S B ); H(A\B) = H(S A b) ~ H{S B ), 


we have 

H(A\B) - H(A\B) = H(Sab; Sab ) - H(S A ; S A ) 

+ Tr (S A b - S A b ) log Sab - Tr (S A - £ 4 ) log £4 
= H(Sab;Sab)-H(S a ;S a ), 

because, by (12.91), log Sab, log S A are the second order polynomials in the 
canonical variables. By monotonicity of the relative entropy, H(S A b',S A b) — 
H(S a ;S a )>0. □ 


12.3.5 Separability and purification 

Consider two bosonic systems, described by CCR, with the symplectic spaces 
(Z 1 , Ai), (Z 2 , A 2 ). The symplectic space of the composite system is the direct sum 
Z = Z\ © Z 2 , the elements of which are conveniently represented by the column 
vectors z = [zi, Z 2 ] T , while the symplectic matrix A 12 is block diagonal 



The Weyl operators of the composite system are defined as 1 ^ 12 ( 21 , 22 ) = 
W l (z l )®W 2 (z 2 ). 

Let Si 2 be a Gaussian state of the composite system with the mean m \2 and the 
covariance matrix c<i 2 - The restriction of the Gaussian state Si 2 to the first factor is 
determined by the expectations of the Weyl operators Wi(zi) 0 1 2 = Wi 2 (zi, 0 ). 
Hence, according to (12.80), it has the mean mi, which is just the first component of 
mu = [mi m 2 ], and the covariance matrix a\, which is the first diagonal block in 
the block matrix decomposition 


a 1 P 
P T ct 2 


a 12 = 


(12.99) 
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The covariance matrix a\i of a state S\ 2 of the composite system is block diagonal 
(P = 0) if and only if the state is a product Si 2 = Si <g> S 2 . 


Proposition 12.27. [217] A Gaussian state is separable if and only if there exist real 
symmetric matrices dj satisfying 


such that 


<*j > ±-a,- ; 


j = 1.2, 


a 12 > 


di 0 
0 a 2 


( 12 . 100 ) 


( 12 . 101 ) 


Proof By using formula (12.81) and the fact that the action of the displacement oper¬ 
ators is “local”, and has no effect on (non)separability, we can assume, without loss of 
generality, that S 12 is centered. Let S 12 be separable. In this case, it has a represen¬ 
tation (11.43). Since S 12 has finite second moments, this representation implies that 
the states Si(x), S 2 (x) have finite second moments for rr-almost all x, hence their 
mean vectors mj(x) and covariance matrices ctj(x) > ±|A j; j = 1,2 are defined 
for n -almost all x. In view of (12.77), the representation in (11.43) implies that ai 2 
has the block decomposition (12.99), with 


aj = L ^oij (x) + mj(x) T mj(x) j n(dx); 

P — I m\{x) T m 2 {x)n{dx). 

JX 

Denoting dj = f x aj(x)n(dx), wehave(12.100) and 


j = 1 . 2 , 


di 


0 


0 1 _ Ta 1 -di ft 

d 2 \ [ P T “2 - d 2 

_ f [mi(x) T mi(x) 
Jx [m 2 (x) T mi(x) 


mi(x) T m 2 (x) 
m 2 (x) T m 2 (x) 


7i (dx) > 0. 


Conversely, if (12.101) holds, let 


712 = a 12 - 


'di 0 
0 a 2 


> 0 


and define n to be the Gaussian probability distribution on Z = Z\ © Z 2 with the 
characteristic function 


L 


e -iMw,z) nid 2s z) = exp 


1 T 

--z y 122 


)■ 
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Then comparison of the quantum characteristic functions shows that 

S12 = j z W'i(zi)S'iW'i(z 1 )* W 2 (z 2 )S 2 W 2 (z 2 )* 7 r(d 2 s z), (12.102) 

where Sj is the centered Gaussian state with covariance matrix dj. Hence, Si 2 is 
separable. □ 

This proof shows that, in fact, a separable Gaussian state can be represented as a 
Gaussian mixture of product Gaussian states. 

From (12.101) and (12.78) it follows that the covariance matrix of a separable state, 
in addition to the usual condition £*12 > ± | A [2 , satisfies 


a 12 > ± 


1 

2 


Ai 0 

0 -A 2 


(12.103) 


This condition expresses the property of the state Si 2 having a positive partial trans¬ 
pose (PPT) with respect to the second system. In fact, changing the sign of the sym- 
plectic form A 2 corresponds to taking the transpose of the Weyl operators W 2 (z 2 ), as 
follows from the CCR (12.47). As shown in [217], the PPT condition is in general 
weaker than the separability condition (12.101), which means that there exist bound 
entangled Gaussian states. 

When considering purification, it is also sufficient to restrict ourselves to centered 
Gaussian states. Purification of the one-mode Gaussian state can be performed by 
using the spectral decomposition (12.24) and the general purification recipe of Theo¬ 
rem 3.11. Then for the many-mode state one can use the decomposition (12.85). This 
leads to rather lengthly computations [90], and therefore we will only give the final 
result, and check that it satisfies all necessary requirements. 

Now, let Si be a Gaussian state, with the covariance matrix ai on the symplectic 
space Z\ of the form Ai = A. It appears convenient to take Z 2 = Z\ with A 2 = 
—A, so that 


A12 = 


A 0 
0 -A 


(12.104) 


Theorem 12.28. Consider the symplectic space Z = Z\ © Z\ and the Gaussian 
state S 12 with the covariance matrix 


0t\2 


a 1 

—a\^ 2 B~ l s/—B 2 — I/Aa\ /2 


a\ l2 B- x y/—B 2 — I/Aa\ /2 
a 1 


(12.105) 


where B = 2 A x a\j 2 . Then Si 2 is a purification of S\. 

Proof. We have seen in the proof of Lemma 12.21 that the operator A = A -1 ai 
is skew-symmetric in the Euclidean space ( Z,a ). Moreover, —A 2 — 7/4 > 0 with 
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equality if and only if the Gaussian state is pure. Consider the block operator 


^12 


A J-A 2 -1/4 

sl-A 2 -!/4 -A 


(12.106) 


in Z = Z i © Z i, where the square root is the operator square root in the space 
(Z,a). It is easy to check that — A\ 2 — I\2/4 — 0 and, hence, the form a \2 — 
A 12^4 12 corresponds to the covariance matrix of a pure Gaussian state. Moreover, its 
restriction to Z i coincides with a i. Formula (12.105) then gives the matrix expression 
for ai 2 . Notice that the matrix (12.105) is symmetric non-negative definite, which 
follows from the fact that — B 2 — 1/4 > 0, while B is skew-symmetric. □ 


In the case of the elementary Gaussian state (12.79) in one mode, with 

at 

formulas (12.104), (12.105) amount to 


'N + 1/2 0 

0 N + 1/2 


Al 2 = 


0 10 0 

-10 0 0 

0 0 0 -1 
0 0 10 


«12 


Y ■ 12 0 0 

0 A+ 1/2 -VA 2 + N 

0 -VN 2 + N A+ 1/2 

VWTn 0 0 


so that 


il2“l2 = 


VA 2 + N 
0 
0 

N + 1/2 


0 


0 —(N + 1/2) VA 2 + N 

A+ 1/2 0 0 V# 2 + N 

0 A+ 1/2 

-(N + 1/2) 0 


VA 2 + N 0 

0 VA 2 + N 


(12.107) 


12.4 Gaussian channels 

12.4.1 Open bosonic systems 

In this subsection, we consider a class of quantum channels that arise naturally from 
the interaction of a bosonic system with a bosonic environment. Let Za, Zb be sym- 
plectic spaces describing the input and output of the channel, and let Zo, Zg be their 
corresponding environments, so that 


Za © Zj) = Zfi © Z E — Z, 


(12.108) 
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and let W A (z A ), ■ ■ ■ be the Weyl operators in the Hilbert spaces X A ,... of the corre¬ 
sponding bosonic systems. Assume that the composite system in the Hilbert space 

= (12.109) 


is prepared in the initial state 0 Sd and evolves according to the unitary operator 
Uj, which corresponds to a symplectic transformation T in Z by (12.64). In accor¬ 
dance with the direct sum decompositions (12.108), T can be written in block matrix 
form 


T = 


K L " 
Kd Ld 


( 12 . 110 ) 


where K : Zb -* Z A , L : Ze Z A , Kd : Zb Zd,Ld '■ Ze Zd- 
The characteristic function of the system state after the interaction described by this 
unitary transformation Uj is 


<Pb(zb) — Tr Uj (5^ 0 So) Uj (Wb(?b) ® Ie) 

= Tr(S A ®S D ) U* (W B (z B ) 0 W E (0)) U T 
— Tr (S A 0 S D ) ( W A (Kz B ) 0 W d (K d z b )) 
= <Pa(Kzb) ■ <Pd ( K d zb ), 


where < po (zd) = TVSoITd(zo) is the characteristic function of the initial state of 
the environment. This transformation can be written in the form 


<Pb(zb) = <Pa(Kz b ) ■ f(z B ), 

(12.111) 

where 

/(zb) = i Pd ( K d zb ) ■ 

(12.112) 

If the initial state of the environment is Gaussian, with parameters (mo, old), f 
has the form of the Gaussian characteristic function 

f(z B ) = exp il(z B ) - ^a(zB,z B ) , 

(12.113) 


where/(zfi) = muiKuzs) and 

a(zB,z' B ) = a D (K D z B ,K D z' B ). (12.114) 

Transformation of states according to the formula (12.111) is a concatenation of the 
unitary evolution and partial trace, which are channels in the sense of Definition 11.16, 
and hence defines a quantum channel d>: ^:(M A ) "$:{M.b)- 

Definition 12.29. A channel that transforms states according to the rule (12.111) 
is called linear bosonic. If, additionally, / is a Gaussian characteristic func¬ 
tion (12.113), where / is a linear form and a is an inner product on Zb, the channel 
is called Gaussian, with parameters ( K , l, a). 
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Let R A ,■ ■ ■ be the vectors of canonical observables in X A , ■ ■ ■, with the commuta¬ 
tion matrices A^,_From the tensor product decomposition (12.109), we have two 

different splits of the set of canonical observables of the composite system: 

[Rb Re] = RdI 

According to (12.65), the action of the operator Uj is described by the linear trans¬ 
formation 

[R' b R' e ] = U*[R B R e ]Ut = [Ra Rd]T, (12.115) 

where T is the block matrix (12.110) and, to simplify the notation, we write R A , - ■ ■ 
instead of Ra 0 Id , ■■■■ In particular, 

R' b = R a K + R d K d . (12.116) 

The commutation and the covariance matrices of the canonical observables R' B at 
the output of the channel are given by the expression 

a B + 2 ^^ ~ ^ 

where S = S A Cg> So and Sa,Sd are density operators in X A and Xq, with the 
covariance matrices a A and ao, respectively. By using (12.116), we obtain 


A fi = K t A a K + KlA D K D , (12.117) 

a B = K T a A K + K^a D K D . (12.118) 

Note that in the Gaussian case a = K^aoKo is the matrix of the corresponding 
form that enters the expression for the function (12.113) defining the channel. Since 
ao > ±|A 0 , the relation (12.117) implies the inequality 

a > ±^[a s - K t & a k\, (12.119) 

which will play a key role in what follows. 

Equation (12.116) is the “input-output relation” describing the channel in the way 
closest to the classical description in terms of “signal plus Gaussian noise”. Here, 
Ra is the quantum signal and No — RdKd is the Gaussian noise with the co- 
variance matrix a = KqOcdKd- Note that the noise can be partially quantum 
and partially classical, because of the possible degeneracy of the commutator matrix 
As ~ K t A a K. 

The final state of the environment is defined by a similar equation for the canonical 
observables Re after the interaction: 


R'e — RaL + RdLd- 
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The weakly complementary channel <J>£ : S A ->■ Se (see Section 6 . 6 ) is defined by 
the relation 

<Pe(ze) = <PA ( Lze) ■ <PD (LdZe) , 

and is also linear bosonic, and it is Gaussian if the state So is Gaussian. If the initial 
state of the environment is pure, Sd = IV'd) (V'D |, the channel <J>£ is complementary 
to <J>, with the Stinespring isometry given by formula V = Urlfo)- 

Theorem 12.30. Inequality (12.119) is a necessary and sufficient condition for a set 
of parameters (K, l, a) to define a Gaussian channel. 

Proof The necessity of condition (12.119) was established before. Let us now show 
that under this condition one can construct the symplectic transformation T defining 
the dynamics of the open bosonic system which gives rise to the required Gaussian 
channel. In doing this, we can always restrict ourselves to the centered case (/ = 0) 
by performing a displacement operation (12.81). 

First of all, notice that, for the given operators K, Kq satisfying (12.117), one can 
always extend the transformation (12.116) to a symplectic transformation T. Indeed, 
relation (12.117) means that the transformation zb [Kzb, Kdzb] T is a symplectic 
embedding of Zb into Z A © Zd = Z, i.e. the columns of the matrix [K, ATd] T 
form a system that can be complemented to a symplectic basis in Z, which forms the 
matrix T. Thus, the problem reduces to the following. For given K,a, satisfying the 
inequality (12.119), find a symplectic space (Zd, Ad), an operator Kq : Zb -> Zd, 
and an inner product in Zd, defined by symmetric matrix an > ±5 Ad, such that 

KZA D K D = A b- K t A a K = A d ; (12.120) 

Kj,a D K D =a. (12.121) 

Assume first that a is nondegenerate. Consider the Euclidean space (Zb, a) with 
the skew-symmetric form Ad- Let 2 n = dim Zb- By the assumption, a > ±^Ad- 
Consider the skew-symmetric operator S, defined by the relation 

A d(zb,z'b) = a (zB, Sz' B ); ZB,z' B eZB- 

According to a theorem from linear algebra, there is an orthonormal basis ej ; j = 
1,..., 2n — /; hj ; j = 1,..., / in (Zb, a) such that 

Sej = Sjhj, Shj = —Sjej ; j = 1 Sej =0; j = l + 1,..., 2n — /. 

The condition a > ± 5 Ad implies I — >0, hence 0 < Sj < 2. 

Let (Zd, Ad) be the standard symplectic space of dimensionality 21 +2(2n—2l) = 
2(2n — /), with the basis <?/, hj ; j = 1,..., 2n — l. Define «d as a form with the 
diagonal matrix 

ciDej = sj l ej, 


aohj = Sj 1 hj ; 

a D hj = hij\ 


cc D ej = ej. 


j = 1 , •••,/; 

j = / + !,•••, 2n — / . 
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In this case, ap > ±^A p. Define the operator A'o : Z B Zp by the formula 
Kpej = Jsjej, Kjjhj = <JsJhj; j = 1 

Kpej = ej, j = l + l,... ,2n — l. (12.122) 

Now, relations (12.120) and (12.121) follow from the consideration of the values of 
the corresponding quadratic forms on the basis vectors ej; j = 1 ,..., 2n — /; hj ; 
j = 1,..., / in Zb- Note that, by construction, ap is the covariance matrix of a pure 
state if and only if Sj = 2; j — 1 ,...,/. 

If, on the other hand, a vanishes on a nontrivial subspace of Zo C Zb, then Ap 
also vanishes on Zo due to the condition a > ±^Ad. Therefore, the vectors ej ; 
j = l + \,... ,2n —l can be chosen in such a way that part of them will form a basis 
in Zo- In this case, definition (12.122) can be modified by requiring Koej = 0 for 
ej e Zq, and relations (12.120) and (12.121) will be satisfied. □ 


12.4.2 Gaussian channels: basic properties 


Formula (12.111) is equivalent to the following equation for the dual channel 

&[Wb{zb)] = W(Kz B )f(z B ), (12.123) 


which shows that the Weyl operators are mapped into Weyl operators, up to a factor. 
In the case of a Gaussian channel 


®*[W b (zb)] - W(K b z b ) exp 


U{zb)~ 2 a ( z B,z B ) 


(12.124) 


The following statement shows that Gaussian channels can be defined abstractly by 
relation (12.124) without reference to the picture of interaction with the environment 
considered in the previous section. 


Proposition 12.31. Condition (12.119) is necessary and sufficient for the mapping 
(12.124) to be completely positive. 

Proof. Sufficiency was established in Theorem 12.30. To prove necessity, note that 
complete positivity of the mapping (12.123) implies nonnegative definiteness of the 
matrices with operator elements 

W(Kz s )<$>*\W(z s )* W(z r )]W(Kz r )* 

= f(z r - z s ) exp l -[A(z r ,z s ) - A(K T z r , K T z s )\ (12.125) 

where z \,..., z n is an arbitrary finite subset of Z. In the Gaussian case (12.124) this 
is equivalent to nonnegative definiteness of Hermitian matrices with elements 

exp | a(z r ,z s ) - l -A(z r ,z s ) + l -A(Kz r , Kz s )j, ( 12 . 126 ) 

and the condition (12.119) follows from Lemma 12.18. □ 
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We will make use of the following properties of Gaussian channels. 

i. A Gaussian channel transforms Gaussian states into Gaussian states. The param¬ 
eters of the input state are transformed according to the relations 

mB = mAK + l, 

a B = K T a A K + a, (12.127) 

compare with relation (12.118). 

ii. The dual of a linear bosonic channel transforms any polynomial in the canonical 
observables Rb into a polynomial in Ra of the same order, provided the function 
/ has derivatives of sufficiently high order. This property follows by differenti¬ 
ating relation (12.123) in the point zb — 0. 

iii. A concatenation of Gaussian channels is a Gaussian channel. Indeed, let <1 f ; j — 
1,2, be two Gaussian channels with parameters Kj , //, aj . In this case, applying 
the definition (12.124), we obtain the Gaussian channel d> 2 ° ‘hi with parameters 

K — K\ K 2 , 

l = Kj h + h, (12.128) 

a = kJchK 2 + a 2 . 

iv. Any linear bosonic, in particular Gaussian, channel is covariant in the sense that 

$>[W(z)*SW(z)] = W(K'z)*$>[S]W(K'z), (12.129) 

where K' = A -1 K T A is the symplectic adjoint of K. 

12.4.3 Gaussian observables 

Assume that we have two symplectic spaces Za,Zb with the corresponding Weyl 
systems in Hilbert spaces Ma, Mb- Let M be an observable in M A with the outcome 
set Zb, given by the probability operator-valued measure M{d 2n z). In what follows, 
we occasionally skip the index b so that Z = Zb etc. The observable is completely 
determined by the operator characteristic function 

4>m(w) = j e~ iA ^^M(d 2n z). 

Note the following apparent property of the operator characteristic function. For 
any choice of a finite subset {w/} C Zb, the block matrix with operator entries 
( p(wj — Wk) is nonnegative definite. 
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An observable M will be called Gaussian if its operator characteristic function has 
the form 

<Pm( w) = Wa(Kw) exp ^a(w, w)^j — exp ^RaKw - ^a(w, w)^j , (12.130) 

where K : Z B -> Za is a linear operator and a is a bilinear form on Zg. The pair 
(K, a) defines the parameters of the Gaussian observable. 

Theorem 12.32. A necessary and sufficient condition for relation (12.130) to define 
an observable is the matrix inequality 


a > ± 1 -K t A a K. (12.131) 

Proof. For an operator function given by (12.130), 

4>m(wj ~ Wk) = ClCj exp ^a(u!j , w k ) - ‘-A(Kwj, Kui k )j, 

where Cj = Wa ( K wj ) exp [—( wj, Wj )], and nonnegative definiteness of matrices 
with scalar entries exp [a(u)j, w k ) — ^A(Kwj, Kw k )\ implies inequality (12.131), 
according to Lemma 12.18. 

Sufficiency of the condition (12.131) will be established by a direct construction of 
the Naimark extension of an observable M in the spirit of Corollary 3.8. 

Proposition 12.33. Assume condition (12.131). Then one can find a bosonic system 
in the space Me with canonical observables Rc, for which Mb C Ma < 8 > Me, and 
the Gaussian state Sc £ ©(Jfc). such that 

M(U) = Tr c (Ia <8> S c ) E AC (U), U<zZ b , (12.132) 

where Eac A the sharp observable in the space Ma <S> Me, given by the spectral 
measure of commuting self-adjoint operators 

X B = (R A K + RcKc) Ag 1 , (12.133) 

(here we again abbreviate the notations as Ra instead of Ra <8> Ic etc.) where Kc : 
Z B —>■ Zc is an operator such that 

K^A C K C = -K t A a K. (12.134) 

Proof. Condition (12.134) means that K^AcKc + K T AaK — 0, that is, commuta¬ 
tivity of the operators defined by (12.133). By adapting the proof of Theorem 12.30, 
we obtain a symplectic space (Zc, Ac), an operator Kc : Z B —> Zc, and an inner 
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product in Zc, given by symmetric matrix ac > ±^A c such that (12.134) holds 
along with 

K^acKc = ot. 

Then the characteristic function of the observable E A c is 

4>e ac (w) = j e~ lA ^ w ’ z) E AC ( d 2n z) 

= exp (iX B Agin) = expi ( R A K + RcKc)w 
= Wa(Kw)W c (K c w), 


whence 

Tr c (IA ® S c ) 4 >E ac ( w ) = WA(Kw)exp(-^ac(K c w,K c w) 

= Wa{Kw) exp w)^j =4 >m(w), 

and (12.132) follows. □ 

An observable M is sharp if and only if a = 0, in which case it is the spectral 
measure of the commuting self-adjoint operators R A K- 

Example 12.34. If R = [q p\ are the canonical observables of one mode A, and 
Rc = Vic Pc] those of another mode C, the operators Xp = W p'], defined as 

q' = q-qc, p' = p + pc\ 

are commuting essentially self-adjoint. If Sc is an elementary Gaussian state (12.79) 
with parameters (0, a), we have an extension, in the sense of Proposition 12.33, of the 
Gaussian observable M with parameters (A,a). In quantum optics, the observable 
M describes the statistics of optical heterodyning. 

On the other hand, a single self-adjoint operator X — k q q + k p p, where k q ,k p 
are real, describes the sharp Gaussian observable with parameters ([ k q k p ] T , 0) cor¬ 
responding to optical homodyning. See, e.g. [38], [107] for more detail. 


12.4.4 Gaussian entanglement-breaking channels 

Theorem 12.35. A Gaussian channel <f> with parameters (K, 0, a) is entanglement¬ 
breaking if and only if a admits the decomposition 

oc = a a + erg, where a A > ±^K t A a K, ap > ± l -Ap. (12.135) 
In this case, <f> has the representation 


*[S] 


L 


W B (z)S B WB(z)*(is(d 2 n z), 


(12.136) 



304 


V Infinite systems 


where S B is the Gaussian state with parameters (0, ot B ), and ns(U) = TV SMa(U), 
U c Zb is the probability distribution of the Gaussian observable Ma with charac¬ 
teristic function 


4>m a (w) = Wa(Kw) exp . (12.137) 

Proof. First, assume that a admits the decomposition (12.135) and consider the chan¬ 
nel defined by (12.136). We have to show that 


< J > *[Ws(tn)] — WA (ATin) exp 


-~a(w,w ) 


(12.138) 


Indeed, for an arbitrary state S 

TYS<D*[IF s (uO] - Tr<D[S]W s (uO - f TrW B (z)S B WB(z)*WB(w)n s (d 2n z) 

Jz B 

= f TV S B WB(z)*W B (w)WB(z) f i s (d 2n z) 

Jz B 

= T \SbWb(w) f exp[-iA(w,z)\ns(d 2n z) 

Jz B 

a B (w , ui)J TV S<pM A (w) (12.139) 


- exp 


= TV 5 (/fin) exp 


~a B (w,w) - ^ a A (w,w) 


whence (12.138) follows. 

Conversely, let the channel <J> be Gaussian and entanglement-breaking. We will use 
the Gaussian version of the proof of Theorem 11.31. Fix a nondegenerate Gaussian 
state oa in <B(Ma) and let {| <?/) be the basis of the eigenvectors of a a, with the 
corresponding (positive) eigenvalues Consider the unit vector 


+ °0 

l n > = J2 yj x i\ e j) ® I e i) 

7 = 1 


in the space Ma <8> Ma- Then |£2)(£2| is the Gaussian purification of a a- Since <J> is 
entanglement-breaking, the Gaussian state 


OAB = (IcU ® <I>)[|n)<n|] (12.140) 

in <&(Ma <8> Mb) is separable. This implies representation (12.102), i.e. 

OAB = f j W A (za)S A Wa(za)* ® WB(ZB)S B WB(ZB)*P(d 2m ZAd 2n Z B), 

Z A Zb 


(12.141) 
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where Sa, S b are Gaussian states with covariance matrices oiamb, and P is a Gauss¬ 
ian probability distribution. 

Taking the partial trace with respect to B, this produces 
o A = TV b (Id^4 <g> 0) [|£2)(£2|] 

= J J W A (ZA)S A WA(ZA)*P(d 2m ZAd 2n ZB) 

Za Zb 

= J J Wa(z a )Sa W a (z a )* P(d 2m z A d 2n z b ), 

Za Zb 

where the bar means complex conjugation in the basis of eigenvectors of a a- Now, 
similar to (11.54), we conclude that the relation 


M A (U) = a ~ 1/2 


f j W a {za)SaW a (zaY P{d 2m z A d 2n z B ) 
Z A U 



defines a bounded positive operator, such that Ma(U) < Ma(Z b ) = I a- It is easy to 
see that Ma(U) is a probability operator valued measure on Borel subsets U c Z B . 

Let us show that representation (12.136) holds for channel with these Ma and 
S B . Consider the entanglement-breaking channel 

*[S]= f W B (z)S B W B (z)*n s (d 2n z). 

Jz B 


where ns(U) = Tr SMa(U ); U c Zg. To prove = d>, it is sufficient to show that 
®[\ej){ek\] = ®[\ej)(ek I] for all j,k. But 


*[\ej)(e k \] = f W B (z)S B W B (z)*{e k \M A (d 2n z)\ej) 

Z B 

{e j \W A (z A )SAWA(zA)*\e k ) 

Za Zb 


-V'V' 1 / / 


• W B (z B )S B W B (z B )*P(d 2m z A d 2n z B ) 
= XJ 1/2 \- 1/2 Tr A (\e k )(ej \® I B )a AB = *[\ej)(e k \]. 


according to (12.140), (12.141). 

It remains for us to show that Ma is Gaussian observable, with the characteristic 
function (12.137), where au — a — a B , and a B is the covariance matrix of the state 
S B ; without loss of generality, we can assume that its mean is zero. Indeed, the density 
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operator S B can be modified with the help of the Weyl operators to have zero mean, 
resulting only in achange in the probability distribution (12.141) which, however, will 
remain Gaussian. But, from (12.139), 


< h*[W / B(t^)] = exp 


1 

-~a B (w,w) 


4>M a ( w ) 


for any channel <J> with the representation (12.136), whence, taking into account 
(12.138), we indeed obtain (12.137), with a A = a — as- □ 


A necessary condition for the decomposability (12.135) and hence for the channel 
to be entanglement-breaking is 


a > l - (a b ± K t A a K ) . (12.142) 

In general, this condition means that for any input Gaussian state of the channel Id^ <g> 
<J>, the output has a positive partial transpose. Indeed, this channel transforms the 
covariance matrix of the input state according to the rule 


'a it 

<*12 _ 


7 

01 

'an 

<*12* 

7 

0" 

+ 

O 

o 

.<*21 

«22. 


0 

-1 

Si 

_<*21 

«22. 

0 

K 

L° «J 


The right hand side, representing the covariance matrix of the output state, satisfies 


i 

17 0 1 

'a^ 

0 " 

7 

O' 

i 

'0 0 

aAB -i 

I 

O 

0 

A^ 

0 

K 

+ 2 

0 ±A b - K t A a K 


1 

2 


A a 0 
0 ±As 


where in the estimate of the second term we used (12.142) with its transpose. How¬ 
ever, this is equivalent to condition (12.103), which is necessary and sufficient for the 
output state to have a positive partial transpose. Thus, condition (12.142) character¬ 
izes the class of Gaussian PPT channels, which is in general larger than the class of 
entanglement-breaking channels. 

The condition of the theorem is automatically satisfied in the special case where 

K t A a K = 0. 


In this case, the operators R A K commute and hence M A is a sharp observable given 
by their spectral measure, and the probability distribution iis(d 2n z) can be arbitrarily 
sharply peaked around any point z by an appropriate choice of the state S. Hence, in 
this case, it is natural to identify <J> as a c-q (classical-quantum) channel determined 
by the family of states z ->■ W(z)S B W(z)*. 
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12.5 The capacities of Gaussian channels 

12.5.1 Maximization of the mutual information 

We start with the classical entanglement-assisted capacity, which can be computed 
most effectively in the Gaussian case. When the state S and the channel <J> are Gauss¬ 
ian, the quantities H(S), H(S, <J>) and I(S, <J>) can be found by using for¬ 

mulas (12.97), (12.118), (12.105). Namely, 7/(<J>[S']) is given by formula (12.97), 
with a replaced by a' = as computed via (12.118). By purifying the input state and 
using formula (7.49), we obtain 

H(S, <J>) = ^Spg ^abs(A7 2 V 12 ) - -Q , 

where the matrix 


is computed by inserting R' B , given by (12.116), into 

« , i2-^a; 2 = ty [r' b * 2 ] t s[/^ r 2 \, 

where R 2 are the (unchanged) canonical observables of the reference system. Alter¬ 
natively, if an explicit description of the complementary channel $ is available, the 
entropy exchange can be calculated as its output entropy H(<&[S]). 

The following result greatly simplifies the computation of the entanglement-assisted 
capacity of the Gaussian channels, by reducing to the case of Gaussian input states. 

Theorem 12.36. Let <f> be a Gaussian channel. The maximum of the mutual infor¬ 
mation I(S, <f>) over the set of states £ (m, a) with given first and second moments is 
achieved on a Gaussian state. 

Proof. By using the representation in (7.53) for the quantum mutual information, we 
can write 

/(5,<D) = H(B\E) + H(B), 

where B is the output of the channel and E is the environment. For a Gaussian 
channel, the first and second moments are transformed in the same way for all states 
in £(m,a). By using Lemma 12.25, we have 

H(B) = H($>[S]) < H(S\S]) = H(Q[S]) = H(B), 

where d>[5] (respectively, S) is the Gaussian state with the same first and second 
moments as OfS] (resp. S). The channel S Sbe = VSV*, where V is the 


a 

1 t K t 


KP 

a 


(12.143) 
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Stinespring isometry for <J>, is also Gaussian, since it is implemented by a unitary 
operator Uj with Gaussian state of environment, namely 

Tr SbeW(zb) <8> W(ze ) — c/)a(Kzb + Fze)<Pd(KdZb + Ldze)- (12.144) 

Therefore, a similar argument, based on Lemma 12.26, implies that H(B\E ) < 
H(B\E). Thus I(S, <f>) < I(S, <f>). □ 

This theorem implies that if the maximum of I(S, <f>) over a set of density operators 
defined by an arbitrary constraint on the first and second moments is achieved, this is 
achieved on a Gaussian density operator. Now, consider the energy constraint 

Tr SF<E, (12.145) 

where F = ReRj is the quadratic operator with positive, nondegenerate energy 
matrix e. Notice, that 

Tr SF = Sp(eau) + 

where m a is the mean vector and a a is the covariance matrix of the state S. The 
expression for the entanglement-assisted classical capacity C ea (<J>) given by Proposi¬ 
tion 11.26 is the maximum of the quantum mutual information I (S, <f>) over all states 
S satisfying the constraint (12.145). The set of conditions for this proposition is satis¬ 
fied for the operator F = ReR T . Indeed, by Theorem 12.22, the operator F satisfies 
condition (11.9). Taking 


F = c[RR t - (Sp oc e K t e Ke)I\ 

we have <$*[/] = cRK T KR T and we can always choose positive c such that 
<f >*[F] < F. Moreover, F satisfies condition (11.9). Therefore, the maximum of the 
quantity I(S, <f>) is achieved and formula (11.41) holds. Since the energy constraint 
can be expressed in terms of mA,oiA , the maximal value C ea (®, F, E ) is achieved on 
the Gaussian state. 

Further reduction of the maximization problem can be obtained by invoking gauge 
covariance. 

12.5.2 Gauge-covariant channels 

Consider a channel : T:(Ma) T^{Mb)- Assume that in the spaces Z^,Zg 
some operators of complex structure Ja, Jb are fixed and let Ga,Gb be the opera¬ 
tors generating the unitary groups of gauge transformations in Ma, 3(b, according to 
formula (12.73). The channel is called gauge covariant, if 


^[e i(pGA Se~ i<pGA ] = e i<pGB ®[S]e~ i(pGB 


(12.146) 
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for all input states 5 and all <p e [0,2?r]. In the case of a linear bosonic channel 4>, 
due to (12.73) and (12.111), this is equivalent to the following 

<f>(e~~ <pjA Kz B )f(z B ) = <l>{Ke-v jB z B )f(e-v jB z B ), 

where <p(z A ) is the characteristic function of the state 5. For the Gaussian channel 
with parameters ( K , 0, a) this reduces to 

KJ b - J a K = 0, [AgV J B ] = 0. (12.147) 

Thus, a natural choice of the complex structure in Z B is given by any J B , commuting 
with the operator a. Existence of such a complex structure is proved similar 
to (12.86), with the difference that the matrix a can be degenerate. 

In optimization problems with bounded mean energy, a natural complex structure 
in Zji is determined by the energy operator F = ReR T , namely, J A is the operator 
of complex structure in Z^, commuting with the operator e A^, so that 

J A € + ejJ=0. (12.148) 

In the case where F is the usual Hamiltonian of the oscillator system, the action of 
J A reduces to multiplication by i in the complexification associated with creation- 
annihilation operators (see (12.63)). 

Now, let the Gaussian channel 4> be gauge covariant with respect to these natural 
complex structures. In this case, 

l(e i(pGA Se~ i(pGA ,®) = 7(5,0), 


which follows from the similar property of the three terms that comprise 7(5, O). 
For 77(5), this is simply a consequence of the unitary invariance of the entropy, for 
77(4>[5]) the covariance of the channel O is additionally used, and for 77(5, O) = 
77(d>[5]) this follows from the covariance of the complementary channel <t> (Exer¬ 
cise 6.38). Defining the average -invariant state 

5 = — / e i(pGA Se~ i(pGA d(p, 

2 n Jo 

we have 

Tr SF — Sp(eou) + m A em^ > Sp(eou) = TrSF, 

where m A , a A are the first and the second moments of the state 5 and the last equality 
follows because of (12.148): 

TrSF = ^ j Tre i(pGA Se- i<pGA (7?e7? T ) dip 

1 f 271 / T \ 

= 2 n J Sp ( e<P ' / ' 4ee<iS ‘ / ' 4 aA ) d( P = s P ( ea ^)- 
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Now, using the concavity of the mutual information 7(5, d>), we conclude that 
7(5, 4>) < 7(5, d>). Thus, the maximum of the quantum mutual information over the 
set of states with bounded mean energy Tr SF < E is attained on a gauge invariant 
(Ga -invariant) Gaussian state. The first and second moments of such a state necessar¬ 
ily satisfy the relations mA = 0, [Ja, A -1 ^] = 0. Considering the Gaussian state 
with these first and second moments, we finally obtain the following corollary. 

Corollary 12.37. Let 0 be a Gaussian channel that is gauge-covariant with respect 
to the natural complex structures Ja,Jb (where Ja is associated with the energy 
operator). In this case, the maximum of the quantum mutual information over the set 
of states with bounded mean energy is attained on a gauge-invariant (Ga- invariant) 
Gaussian state. 

12.5.3 Maximization of the coherent information 

Consider the coherent information 

Ic (5, d>) = 77(d>[5]) - 77 (5, d>). 

Proposition 12.38. Let <f> be a Gaussian channel that is degradable 


$=fo$ 


(12.149) 


such that T is Gaussian channel. In this case, the quantum capacity 

e(d>) = sup l c (S, d>), 

s 

where the supremum is taken over Gaussian states S. If, in addition, the channels 
are gauge-covariant, then the supremum can be taken only over gauge-invariant (Ga- 
invariant) Gaussian states. 

Proof. Assuming that the channel is degradable, i.e. (12.149) holds, where 4> is the 
complementary channel and T is some channel, we have (see Section 10.3.3) 

I c (5, d>) = 77 ( E'\E) = 77 (, S E ’ E ) - 77 (S E ), (12.150) 

where Se'E = V'SgV'* and V' : Mb -*■ Me <8* Me' is the minimal Stinespring 
isometry for the channel T. Now if the channel T can be chosen Gaussian, the chan¬ 
nel Sb -> Se’E is also Gaussian and the argument invoking Lemma 12.26 applies. 
Therefore, l c (S, d>) = H (E'\E) < H (E'\E) = I c (5, d>). By using Proposi- 
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tion 10.27, it follows that the quantum capacity of the degradable channel is given 

by 

<2(d>) = sup l c ( S , d>) = sup l c (S, d>) , 

5 s 

where the supremum is taken over Gaussian states S . 

If, in addition, d> is gauge covariant, the concavity of the conditional quantum 
entropy (12.150) implies that the supremum can be taken only over gauge-invariant 
Gaussian states. □ 

12.5.4 The classical capacity: conjectures 

It is natural to consider the classical capacity of a quantum Gaussian channel d> under 
the additive input constraint (11.28) with 

E (n) = F<g>/<g>---<g>/ + -- - + /<g>---<g>/<g>F, 

where F — ReR T is the quadratic energy operator with positive-definite matrix €. 
However, finding the classical capacity C(d>, F, E) in general depends on the solu¬ 
tion of the problem of the additivity of the constrained %-capacity (11.33) which is still 
open for the Gaussian channels with quadratic input constraints. In any case, the x~ 
capacity C x (4>, F, E), defined by relation (11.31), is a lower estimate for C(4>, F, E) 
that coincides with C(d>, F, E ) in the case of additivity, e.g. for entanglement-breaking 
channels. A related open problem is the additivity of the minimal output entropy (8.33) 
in the class of Gaussian channels. 

However, even the computation of C x (®, F, E ) for Gaussian channels remains, 
in general, an open problem (except for the case of c-q channels and a special case, 
which will be considered in Proposition 12.46). At least, in this situation an opti¬ 
mal generalized ensemble always exists, because the conditions of Corollary 11.24 
for F = ReR T are satisfied, as was shown in Section 12.5.1. Thus, the maximum 
of the quantity x$( n ) defined in (11.35) is achieved, and C^(4>, F, E ) is given by 
relation (11.38). 

Consider the following hypothesis of Gaussian optimal ensembles: For a Gauss¬ 
ian channel with the quadratic input energy constraint, the maximum of the quantity 
X<t>(x) in expression (11.38) for C x (4>, F, E ) is attained on the Gaussian generalized 
ensemble n that consists of generalized coherent states W(z)SqW(z)*, where So is a 
pure Gaussian state, with a Gaussian probability distribution P(d 2n z). 

For such an ensemble the covariance property (12.129) implies 

H(Q[W(z)S q W{z)*]) = H(<i>[Sa]) 


hence 


X9&) - H(<S>[S X ]) - HmSo}), 


(12.151) 
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which leads us to the hypothesis of the Gaussian minimizer for the output entropy: 

For a Gaussian channel 4>, the minimum of the output entropy is attained on a (pure) 
Gaussian state So- 

Consider the gauge transformations in Ma that are associated with an operator of 
complex structure. Define their action on the generalized ensembles by the formula 

n<p(U) = ?r({S : e ivGA Se- ivGA e £/}), <p e [0,2 it], 

for Borel subsets U e <5 (J(a)- A generalized ensemble is gauge-invariant , if = n. 
By using the concavity of the functional X$( n ) (Exercise 11.23) and arguing as in 
Section 12.5.2, one can show that if the Gaussian channel is gauge covariant with 
respect to the complex structure associated with the energy matrix e, the maximum 
in (11.38) is attained on a gauge-invariant generalized ensemble. Again, it follows 
that the average state S n of such an optimal ensemble is gauge-invariant. 

However, under the same assumption one can prove the following proposition, 
relating the two hypotheses for gauge-covariant channels. 

Proposition 12.39. Let a Gaussian channel d> be gauge-covariant with respect to the 
complex structures Ja, Jb- Assume that the minimum of the output entropy is attained 
on a GA-invariant Gaussian state Sq- In this case, the hypothesis of optimal Gaussian 
ensembles is valid, and the optimal ensemble n can be chosen such that the output 
state Sb = OfSjr] is a Gg-invariant Gaussian state. 

Proof. It was already observed that an optimal ensemble n exists. Let Sb = $[5*]. 
Now, Tr S n F < E and 

c*(d>, F, E ) = X*{x) = H(S b ) - I H(®[S])n(dS) 

< H(S b ) - 7/(d>) = H(S b ) - //(d>[S 0 ]), (12.152) 

by the assumption //(d>[So]) = H(&). 

Consider a Ga -invariant state 

Sa = ^~ f 71 j* GA S x e~ i * GA dip, 

Jo 

then Tr 5,4 F = Tr S n F. Now, let 5^ be the Gaussian state with the same first and 
second moments as 5^. In this case, again, 

Tr 5,4 F = Tr 5,4 F = Tr 5^ F. (12.153) 

Moreover, 5^ is a -invariant Gaussian state and S B = ^[5^] is a G^-invariant 
Gaussian state with the same first and second moments as d>[5^], due to the Gaus- 
sianity of d>. Moreover, 

H(S b ) > H(<S>[S a ]) > H(<D[5 W ]) = H(Sb). (12.154) 
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Here, the first inequality follows from the principle of maximal entropy (Lemma 12.25) 
while the second follows from the concavity of the entropy. Since So is a pure Ga- 
invariant Gaussian state, one has the decomposition (12.88) for the state 

S A = j W(z)S 0 W(z)*P(d 2s z). 

Denote by n the ensemble of generalized coherent states W(z)SqW(z)* with Gauss¬ 
ian distribution P(d 2s z). Then 

S B = $&] = J <S>[W{z)S<>W(z)*]P{d 2s z) = J <Z>[S]n(dS) 
and 

J H(<S>[S])n(dS ) = H(<i>[S 0 }) = H(<S>). 

By relation (12.153), ensemble n satisfies the energy constraint. Moreover, 

X*{Z) = H(S b ) - J H(<t>[S])n(dS) = H(S b ) - H($>[S 0 }). (12.155) 

Bringing together relations (12.152), (12.154), and (12.155), we obtain /$(ir) > 
X<t>(n) = C x (d>, F, E). Therefore, n is the optimal Gaussian ensemble having the 
required properties. □ 

The condition of Proposition 12.39 is satisfied in the special case where there exists 
a G^-invariant Gaussian state So such that 4>[So] is pure, so that the minimal output 
entropy //(d>) = 0. In this direction, we have the following result. 

Proposition 12.40. Let the Gaussian channel d> be gauge-covariant with respect to 
the complex structures Ja,Jb ■ Assume that the parameters of the channel (K , 0, a) 
satisfy the following conditions. 

i. the matrix Ax = A B — K T A^K is nondegenerate 

ii. Jb is an operator of complex structure in the symplectic space (Z B , Ax) 

iii- a = \A k Jb 

Let So be the pure GA-invariant Gaussian state with parameters (0, a^), where 
a a — ^ A a J a ■ Then the hypothesis of optimal Gaussian ensembles is valid, with the 
optimal Gaussian ensemble n described in the proof of Proposition 12.39. 

Moreover, the additivity property (11.33) holds for this channel, and 

C(<D, F, E) = Cv(d>, F, E) = max (®[S1), (12.156) 

* S:TrSF<E 

where the maximum is attained on a GA-invariant Gaussian state. 
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Proof. Recall that condition ii. requires that Aj ^Jb = —Jg A k is a positive definite 
matrix. Moreover, statement iv. in Exercise 12.19 implies that a = ^AkJb is the 
covariance matrix of a pure Gaussian state over (Zb, A^) and hence satisfies a > 
±|Aj£. But this is just the necessary and sufficient condition (12.119) for the triple 
( K , 0, a) to define a Gaussian channel. Hence, condition iii. is consistent. Let us show 
that d>[So] is a pure Gaussian state with the parameters (0, as), whereas = \AbJb, 
which implies H (d>) = 4>[So] = 0 and hence the conclusion of Proposition 12.39. 
According to (12.127), the covariance matrix of the output state d>[So] is 

ocb = K T a A K + a = - (k t A a J a K + A kJb^ ■ 

The condition J A K — KJb (see (12.147)) and the definition of A k imply that this is 
equal to 

ocb = ^ (k T A A KJ b + A kJb^ = -AbJb 

which proves the first statement. 

To prove (12.156), we first notice that, by an argument similar to the one used to 
deal with the maximum of the mutual quantum information in Section 12.5.1, the 
maximum in (12.156) is indeed attained on a G A -invariant Gaussian state SU. Now, 

max H(Q[S]) = H(<S>(S A ]) = *#(*) = C*(d>, F, E ), (12.157) 

o :Tr or <h 

where if is the Gaussian ensemble constructed from S A , as in the proof of Proposi¬ 
tion 12.39. Now consider the channel d>®” for which also the minimal output entropy 
H (<&*") = 0. Thus, 

«C X (<D, F,E) < C Z ($®",F W ,«£) 

< max #(<&*" [S (,,) ]) 

SM:TnS^FM<nE 

< n max //(d>[S]). 

S:TrSF<E 

Here, the first and second inequalities follow from definition (11.31) of C x and the last 
one from Lemma 11.20. Combined with (12.157), this implies the additivity property 
C Z ($®",F W ,»£) = nC x (<b,F,E) and hence (12.156). n 

12.6 The case of one mode 

12.6.1 Classification of Gaussian channels 

In this section we are interested in the problem of reducing an arbitrary quantum 
Gaussian channel to the simplest “normal” form, which can be obtained by applying 
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suitable unitary operators, corresponding to symplectic transformations T\,T 2 via the 
formula (12.64), to the input and the output of the channel: 

<&*' [W{z)] = U* t 4>* [U* 2 W(z)U T2 \ U Tx 

That is, 

&'[W(z)\ = W{T\KT 2 z)f{T 2 z). 

Here, we provide a complete solution of this problem in one mode, s = 1 . 

Let Z be the two-dimensional symplectic space, i.e. the real linear space of vectors 
z = [x v] T with the symplectic form 

A(z, z') = xy' — x’y. (12.158) 

A basis e,h in Z is symplectic if A (e,h) = 1, i.e. if the area of the oriented 
parallelogram based on e, h is equal to 1. Recall that a linear transformation T in Z 
is symplectic if it maps a symplectic basis into a symplectic basis. 

According to formula (12.124), a Gaussian channel is characterized by the param¬ 
eters K,l,a satisfying 

o!>±^[a-A: t A/s:]. (12.159) 

By way of a displacement transformation (12.52), we can always make / = 0, which 
we assume. Thus, 

d>*[lL(z)] = W(Kz)exp -iz T az . (12.160) 

In the theorem below, we use the notation 

h L° l . ’ 

Theorem 12.41. Let {e,h} be a symplectic basis. Then depending on the value 

A) A(Ke,Kh) = 0i B) A(Ke,Kh) = l; 

C) &(Ke,Kh) = k 2 >0,ky£l D) A(Ke, Kh) = -k 2 < 0 

there are symplectic transformations T \, T 2 , such that the channel d>*' has the form 
(12.160), with 

Ai) K = 0; 

01 = f(Vo + 2^ ^2; No > 0; 
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A 2 ) 

K 


a 

Bx) 

K 


a 

b 2 ) 

K 


a 

C) 

K 


a 

D) 

K 


a 


1 o 
o oJ ; 

{ No + \) h 



0 

0 


12, 

N c I 2 ; N c > 0; 
kl 2 \ k > 0, k ^ 1; 




h\ 


1 0 ' 
0 -1 


k > 0; 




Apparently, in the cases A), B 2 ), C ), the channel is gauge-covariant with respect 
to the natural complex structure associated with multiplication by i in the complex 
plane x,y. 


Proof. 

A) A (Ke,Kh) = 0. In this case, A (Kz,Kz') = 0 and either K = 0 or K has 
rank one. Now, inequality (12.159) is just the condition on the covariance matrix 
of a quantum Gaussian state. As follows from Lemma 12.12, there is a symplectic 
transformation T 2 such that 


a(T 2 z, T 2 z) = (n 0 + (x 2 + y 2 ). (12.161) 

In the case where K = 0, we just have Af). 

Let K have rank one. In this case, K T 2 has rank one and there is a vector e' such 
that KT 2 [x y] T = (k\x + k 2 y) e’. Now, there is a symplectic transformation 7) 
such that Tie’ = [1 0] and hence T\KT 2 [x y] T = [k\x + k 2 y 0] T . By making a 
rotation T that leaves a unchanged, we can transform this vector to [kjx0] T with 
kj ^ 0, and then, by a symplectic scaling, (squeezing) 7j' we arrive at case A 2 ). 

B, C) A (Ke, Kh ) = k 2 > 0. Then T = k -1 K is a symplectic transformation and 
A.(Kz,Kz') = k 2 A(z,z'). Let 7) = {TT 2 )~~ l , where T 2 will be chosen later, so 
that T\KT 2 = kl. 
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In case B), k = 1 and condition (12.159) is just the positive definiteness of a. We 
now have the subcases 

B 2 ) If a is nondegenerate, by Lemma 12.12 there is a symplectic transformation T 2 
such that 

a(T 2 z, T 2 z) = N c (x 2 + y 2 ), 

where N c > 0. Also, if a = 0, we have a similar formula with N c = 0. 

Bi) On the other hand, if a is degenerate of rank one, a(z, z) = {k\x + k 2 y) 2 for 
some k \, k 2 simultaneously not equal to zero, and then for arbitrary N c > 0 there is a 
symplectic transformation T 2 such that 

a(T 2 z,T 2 z) = N c x 2 

In particular, we can take N c = \. 

In case C), k ^ 1, and condition (12.159) implies that a/\k 2 — 11 is a covariance 
matrix of a quantum Gaussian state. Hence, there exists a symplectic transformation 
T 2 such that 

oc(T 2 z, T 2 z) = \k 2 -l \ ^Nq + ^ (x 2 + y 2 ) = ^(V c + ^ 2 (x 2 + y 2 ), 

where No > 0, N c = \k 2 — 11 TVo- 

D) A (Ke, Kh ) = —k 2 < 0. Then T = k~ l K is an antisymplectic transformation, 
which means that A(7z, Tz') = — A (z,z') and A (Kz,Kz') = —k 2 A(z,z'). Similar 
to cases B,C), we obtain 

oc(T 2 z, T 2 z) = (k 2 + 1) ^ No + ^ (x 2 + y 2 ) = ^ N c + ^ j (x 2 + y 2 ), 

with (Vo > 0, N c — ( k 2 + l)No- Letting T\ = A (TT 2 )~ ] , where A [x v] T = 
[x —y] T , we obtain the first equation in D). Note that 7) is symplectic, since both T 
and A are antisymplectic. 


□ 

As explained in Section 12.4.1, a Gaussian channel 4> can be dilated to a linear 
dynamics of an open bosonic system described by the symplectic transformation of 
the canonical variables q, p and the ancillary canonical variables qE, PE,--- in a 
Gaussian state Se ■ Moreover, this linear dynamics also provides a description of the 
weak complementary channel d>£ mapping the initial state of the system q, p into 
the final state of the environment q ’ E , p ' E ,.... In the case where the state Se is pure, 
<t>E is just the complementary channel <f>, which is determined by 4> up to unitary 
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equivalence. Note that the environment may have more than one mode (which is 
reflected by introducing ... in the environment variables). 

Let us provide such a description in terms of the input-output relation in each of the 
cases of Theorem 12.41. 

Ai) This is the completely depolarizing channel 

q' = QE 
P = PE 

where qE , pe describe the environment in the quantum thermal state Se with mean 
number of quanta No- Its weak complementary is the ideal channel Id. 

A 2 ) The input-output relation is 


q' = q + qE (12.162) 

P = PE, 

where qE , pe are again in the quantum thermal state Se with mean number of quanta 
Nq. The equation describes a degenerate classical signal q with additive quantum 
Gaussian noise. The weak complementary to this channel is described by the trans¬ 
formation 


q' E = 9 (12-163) 

Pe = P~ PE, 

where pe can be regarded as a classical real Gaussian variable with variance No + 

Bj) The equation of the channel has the form (12.163), where pe has variance 
so that the mode qE , pe is in pure (vacuum) state, and the complementary channel 
is given by (12.162). This is the quantum signal plus degenerate (two-dimensional) 
classical Gaussian noise. 

B 2 ) Channel with (nondegenerate) additive classical Gaussian noise 

q' = q+t; 

p = p + q, 

where f, rj are i.i.d. Gaussian random variables with zero mean and variance N c . 

Exercise 12.42. Check that the Stinespring dilation for this channel can be ob¬ 
tained by introducing an environment with two bosonic modes qj,pj\j = 1,2 
in a pure Gaussian state having zero means, zero covariances, and the variances 

1 

4 TT e ' 


Dqi = Dg2 = N c ; Dpi = Dp2 = 
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so that f = <7i, r] = q 2 and with the following dynamics for the canonical ob¬ 
servables: 


q' = q+qi, 

P' = P + 92, 

91 = 9l, 

Pi = Pi ~ P - (12/2, 

92 = 92 , 

p' 2 = P 2 + q + qi/2. 

C) Attenuation/amplification channel with coefficient k and quantum noise with mean 
number of quanta No- In the attenuation case (k < 1), the input-output relation is 

q' = kq + Vl — k 2 q£ 
p' = kp + Vl — k 2 pE, 

where qE, PE are in the quantum thermal state Se, with mean number of quanta Nq. 
The weak complementary is given by the equations 

q' E = Vl — k 2 q — kqE (12.164) 

p' E = Vl - k 2 p - kps, (12.165) 

and is again an attenuation channel (with coefficient k' = Vl — k 2 ). 

In the amplification case (k > 1), we have 

q' = kq + Vk 2 - 1 q E 

p' = kp - Vk 2 — l pe, 

with the weak complementary 

q' E = Vk 2 - 1 q + kq E 

p' E = — Vk 2 - Ip + kps, 

see case D). 

D) The input-output relation is 

q' = kq + \/k 2 + 1 qs 
p = ~kp + y/k 2 + 1 p E , 

which is the same as the weak complementary to the amplification channel with coef¬ 
ficient k' = Vk 2 + 1 and quantum noise with mean number of quanta No- This chan¬ 
nel describes the attenuation/amplification with phase conjugation, q, p —*■ q,—p. 
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Finally, let us show how the c-q channel considered in Section 12.1.4 fits into the 
general description of quantum Gaussian channels. Since the input is two-dimensional 
classical, one has to use two bosonic input modes qi, Pi,,q 2 , P 2 to describe it 
quantum-mechanically, so that, e.g. m q — q\,m p = qt- The environment is one 
mode q, p in the Gaussian state So and hence the output is given by the equations 

q' = q\+q = q + m q ; 
p' = <72 + p = p + m p , 

and the channel parameters are 

n o- 


L° o 

In this case, the full set of the Heisenberg equations is the same as in case B 2 ), with 
the roles of input modes and environmental modes interchanged. 

12.6.2 Entanglement-breaking channels 

Let us apply theorem 12.35 to the case of one bosonic mode A = B, where 
A a(z,z') = A#(z, z') is given by the relation (12.158). As shown above, any one¬ 
mode Gaussian channel can be transformed to one of the normal forms. 

We only have to find the form K T A^K and check the decomposability (12.135) in 
each of these cases. We rely upon the simple fact: 

I Exercise 12.43. Show that 


a 


-(*> + D 


h- 


(12.166) 


(" + 0 h - ±l i 4 


I if and only if N > 0. 

With this, we proceed to consideration of each class. 

A) K T AK = 0, hence 0 is a c-q (in fact essentially classical) channel. 

B) K t A K = A, hence the necessary condition of decomposability (12.135) re¬ 
quires a > i A. This is never satisfied in case Si) due to the degeneracy of a. Thus, 
the channel is not entanglement-breaking (in fact it has infinite quantum capacity as 
shown in [105]). On the other hand, in case B 2 ) the condition (12.135) is satisfied 
with oib = oca = a /2 if and only if N c > 1. Hence, <!> is entanglement-breaking in 
this case. 
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C) K T AK = k 2 A. It is clear that in this case the decomposability condition holds 

if and only if a > ±|(1 + k 2 ) A, which is equivalent to N c + ^ > ^ 1 ~ 1 ^ ^ or 

(V c > min (l, k 2 ). (12.167) 

This provides the condition for the entanglement-breaking (which also formally in¬ 
cludes the case Bi). 

D ) K t AK — —k 2 A. Again, the decomposability condition holds if and only if 
a > ±|(1 + k 2 )A, which is always the case. Hence, the channel is entanglement¬ 
breaking for all N c > 0. 

Thus, the additivity property (11.33) holds for one-mode Gaussian channels of the 
form A ), D) with arbitrary parameters, and B 2 ), C ) with parameters satisfy¬ 
ing (12.167). In general, entanglement-breaking channels have zero quantum capac¬ 
ity, <2(<J>) = 0, see Corollary 10.28. However, in Section 12.6.4 a broader domain of 
zero quantum capacity will be demonstrated based on degradability analysis. 

Exercise 12.44. Use Exercise 12.43 to show that in the case of one mode, con¬ 
dition (12.142) implies (12.135) and hence every PPT channel is entanglement¬ 
breaking. 

12.6.3 Attenuation/amplification/classical noise channel 

From the point of view of applications, cases C) and B 2 ) are the most interesting ones. 
The action of the channel <J> can in these cases be described by the single formula 

Tr<J>[S]JF(z) = TiSW(kz) -exp (N c + \k 2 - l|/2) (x 2 + y 2 ) , (12.168) 

where the parameter N c > 0 is the power of the environment noise and k > 0 is the 
coefficient of attenuation/amplification in case C). In this case, N c = \k 2 — 1 |7Vo, 
where No is the mean photon number of the environment. The case B 2 ) of an additive 
classical Gaussian noise channel corresponds to the value k = 1. Obviously, the 
channel is gauge-covariant with respect to the natural complex structure associated 
with the multiplication by i in the complex plane x,y. Therefore, in what follows, 
basing ourselves on the results of Section 12.5, we can restrict ourselves to the gauge- 
invariant states. 

Let the input state S = S(N ) of the system be the elementary Gaussian with the 
characteristic function 

TrS(N)W(z) = exp ((V + ^ {x 2 + y 2 ) , 

the parameter N representing the power (mean number of quanta) of the signal 

Tr S{N)a i a = N. (12.169) 
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In this case, the entropy of 5 (A) is 

H(S(N)) = g(N). (12.170) 

From (12.168) we find that the output state <J>[5] is the Gaussian state S(N') where 

A' = A 2 A + A^ (12.171) 


and the total noise 


A^ = max{0, (< k 2 - 1)} + N c (12.172) 

is the output mean photon number corresponding to the vacuum input state 5(0). The 
first term is nonzero only for A: > 1, when it represents the amplifier noise. The output 
entropy is equal to 

77(<1>[5(A)]) = g(N'). (12.173) 

Now, we compute the entropy exchange //(5(A), <J>). The (pure) input state 5i2 of 
the extended system M (giJfo is characterized by the 2x2-matrix (12.107). According 
to (12.143), the action of the extended channel <J> <g> Id transforms this matrix into 

0 -(A'+1/2) AVA 2 + A 0 

A'+ 1/2 0 0 kjN 2 + A 

AVA 2 + A 0 0 A + 1/2 

0 AVA 2 + A -(A + 1/2) 0 

where A' is given by (12.171). From formula (12.97), we deduce H (5(A), d>) = 
g(|Ai | — + g(|A 2 | — where ±Ai, ±A 2 are the eigenvalues of the matrix in the 

right-hand side. Solving the characteristic equation, we obtain 

Ai,2 = \ {(N’ - A) ± D) , (12.174) 

where 

D = ^(A + A' + 1) 2 -4A 2 A(A + 1). 

Hence, 

(D + N’-N-\\ (D-N' + N-\\ 

H{S{N),V) = g^ --- )+g{ - - - J. (12.175) 

Let us consider the classical capacities of the channel <!>, under the additive input 
energy constraint (11.28) corresponding to the operator F = a f a. According to 
Theorem 12.36 and its corollary, the entanglement-assisted classical capacity 

C ea (^,F,E) - max 7(5, <J>) 

S:Tr Sa^a<E 


^ 12 a 12 ~ 
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Entropy 



Figure 12.1. Entropies: the output entropy (12.173), the entropy exchange (12.175) for the 
case N c = 0. 


is attained on a gauge-invariant Gaussian state. Since for our channel the condi¬ 
tion (11.42) is obviously satisfied, Tr Sa^a = E for this state, which is thus the 
elementary Gaussian state S(E). Hence, the maximum is equal to 

C ea (<J>, F, E) = I(S(E), <J>) = H(S(E)) + H(Q[S(E)]) - H(S(E), <J>), 


where the entropies are given by relations (12.170), (12.173) and (12.175), with N 
replaced by E. 

Considering C ea ($>, F, E) as a function of the parameters E,k,N c , it is inter¬ 
esting to compare it with the quantity C x ( <J>, F, E ), which gives a lower bound for 
the classical capacity C(4>, F, E ) (and possibly, coincides with it). If the hypothe¬ 
sis of Gaussian optimal ensembles holds true, the optimal ensemble consists of co¬ 
herent states S f = |£)(£|; £ 6 C, with the Gaussian probability density pit,) = 
(tiE)~ 1 exp (—1£| 2 /£), producing the value 

C*(<D, F, E) = g (k 2 E + Nq) — g (Nq) . (12.176) 


The ratio 


C eQ ($, F,E) 
C*(<D,F, E) 


(12.177) 


then gives at least an upper bound for the gain of using entanglement-assisted versus 
unassisted classical communication. 


Exercise 12.45. Show that when the mean number of quanta E in the signal 
tends to zero, while the total noise Nq > 0, 


C x (<!>,F,E)~k 2 E log^^±lj, 
C ea (0, F, E) ~ -k 2 E log E/(Nq + 1), 


so that the entanglement gain G tends to infinity as — log E. 
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Figure 12.2. The entanglement-assistance gain (12.177) as a function of k for N c = 0. 
Parameter=the input energy E. 


Now, consider the case Nq = 0, which corresponds to the ideal attenuator with 
zero noise N c = 0 and attenuation coefficient k < 1, also called a pure loss channel 
(beam splitter) in quantum optics. According to (12.171) the vacuum state 
(N = 0) is reproduced by this channel, making the minimal output entropy equal 
to zero and hence additive. In fact, this case falls under the conditions of Proposi¬ 
tion 12.40, implying 

Corollary 12.46. [67] For the ideal attenuator, the additivity and the Gaussian op¬ 
timizer conjectures hold, with 

C(<D, F, E ) = C x (<D, F, E) = g(k 2 E). (12.178) 

Indeed, for this channel N' — k 2 E so that 

max (<D[S]) = //(<1>[S(£)]) = g(k 2 E). 

S\TrSata<E 


□ 

In the case of the ideal attenuator, we find D = (1 — k 2 )N + 1, and the entropy 
exchange (12.175) is equal to g((l — k 2 )N), whence 

C ea (<D, F, E) = g(E) + g(k 2 E ) - g(( 1 - k 2 )E). 


Thus, the gain of entanglement assistance is 


G 


g(E) - g((l - k 2 )E) 
g(k 2 E) 


If the signal power E -»• 0, using the asymptotic g(E) ~ — E log E, we obtain 
C(<1>, F, E) ~ —k 2 E log E, C ea (i J>, F, E) ~ —2 k 2 E log E so that G -> 2. 
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Expression (12.178) for the ideal attenuator can be used to provide a tight upper 
bound for the classical capacity of the channel, defined by formula (12.168) with 
arbitrary parameters k, N c . Indeed, denoting this channel by ®k,N c o ne has 

®k,N c = ^^/iV^+1,0 0 ®kljN <5+t,0’ (12.179) 

where Nq is given by (12.172), so that Nq + 1 > max{l ,k 2 }. In other words, the 
channel $k,N c can be represented as the ideal attenuator with coefficient k / y/N q + 1, 
followed by the ideal amplifier with coefficient y/N/ + 1. This is a consequence 
of (12.168) and the transformation formula (12.171) for the signal power. By using 
the data-processing inequality 

C(<1>2 ° <J>i, F, £) < C(<J>i, F, E) (12.180) 

which is obtained similar to inequality (10.36), but also taking into account the iden¬ 
tical input constraints for the channels $ 1 , we get 

C (®k,N c ,F,E) < C {® k / JN^+I, 0’ 

Applying formula (12.178) for the ideal attenuator $ k/^J 'N'+i 0 anc * comparing it 
with (12.176) results in 

g{k 2 E + Nq) - g((V') < C(<&k,N C ’ F, E) < g{k 2 E/{Nq + 1)). (12.181) 

The above argument was provided in [137] (in the case of an attenuator, k < 1), where 
it was also shown that the difference between the functions in the left-, and right-hand 
sides does not exceed loge % 1.45 bits. 


12.6.4 Estimating the quantum capacity 

In this section, the quantum capacity of the one-mode channel will be computed in 
several cases, based on the (anti-)degradability property (see Section 10.3.3, 12.5.3). 
We use the fact that composition of one-mode Gaussian channels is again a one¬ 
mode Gaussian channel. In particular, the general rule (12.128) implies the following 
relations. 

Denote by $>c,k (resp. $>o,k) a channel of class C (resp. D) with coefficient k and 
a fixed value of parameter N c - In this case, 

®C,k 2 ° ®C,k x = ®C,k\k2’ if &1, &2 < 1, (12.182) 

$D,k 2 0 ®C,ki = < &D,kik2’ if ^1 > 1- (12.183) 


Proposition 12.47. For an attenuator with k < -^=, the quantum capacity 
Q{®C,k) ~ 0- For an attenuation/amplification channel with k > -f= and N c = 0, 


k 2 

\k 2 -l\- 


Q(®c,k) = <2g(4>c,A:) = log 


(12.184) 
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Here 


Q g ($) = sup I C (S(N), <D), 

N>0 

where the supremum of the coherent information I C (S, <J>) = //(<J>[S]) — H(S, <I>) is 
taken over all gauge-invariant Gaussian input states S(N). 



Q = \og(k 2 /\k 2 - 11) 

Figure 12.3. Entanglement breaking (EB) and the quantum capacity Q. 


Proof The proof will follow if we show that attenuator with k < is anti- 
degradable, while the attenuation/amplification channel with k > and N c = 0 
is degradable. 

Let k < -7=. In this case, k\ = , k 

V2 VI —k z 

complementary of is 


< 1 and, taking into account that the weak 


v c,k 


<D 


C,Vt-/fc 2 ’ 


(12.185) 


see (12.164), relation (12.182) takes the form 


= (12.186) 

which implies that i s anti-degradable, since the weak complementary can be 
dilated to the complementary by purifying the environment state. Hence, by Proposi¬ 
tion 10.27, its quantum capacity is equal to zero. 
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In case N c = 0, when the environment state is pure, the complementary channel 
coincides with the weakly complementary <t>c,k = ®ck ~ ®c,k'> where k' = 
Vl — k 2 > and relation (12.186) is the same as 


®c,k' = k ° ®C,k’’ 

’ Vi -* 2 


which means that ®c,k' is degradable for 
applies, implying 


l 

V2 


<k' 


< 1. Therefore, Proposition 12.38 


Q(*cjk) = QG($c,k)- (12.187) 

Similarly, in the amplifier case (k > 1,N C = 0) relation (12.183) implies 


®c,k = $ vtttt ° ®c,k, 

D < — k — 

whence <&c,k i s degradable and (12.187) continues to hold. It remains for us to com¬ 
pute Qo{^c,k)- 

In the case where N c > 0, the coherent information 

fD + N' — N- 1\ SD-N' + N- 1\ 

I C (S(N), <J>) = g(N')~g ( -j- )~ 8 (-2-J (12 ‘ 188) 

is a complicated function of the input power N. We have f c (S(0), d>) = 0 and 
lim I C (S(N), <J>) - log A: 2 - log |A 2 - 1| - g (. N c /\k 2 - 1|), k ? 1. 

N-*-oo 

Let us now consider the case where N c = 0. The behavior of the entropies 
//(<J>[S(fV)]), H(S(N), <J>) as functions of k for N c = 0 is shown in Figure 12.1. 
Note that, for all N, the coherent information H(<&[S(N)]) — H(S(N), <J>) turns out 
to be positive for k 2 > | and negative otherwise. It tends to — H(S(N)) for k -> 0, 
is equal to H(S(N)) for k = 1, and quickly tends to zero as k -»• oc. 

If k < 1 (ideal attenuator), it follows that N' = A 2 TV, D = (1 — k 2 )N + 1, and 

I C (S(N), <J>) = g(k 2 N) - g(( 1 - k 2 )N). 


Exercise 12.48. Show that I C (S(N), <J>) is a convex decreasing function of N 
for A 2 < ^ (correspondingly, a concave increasing function for A 2 > ^), hence 


sup I C (S(N), <J>) 
N 


/ c (S((V),<J>)U =o = 0, A 2 <2 
limAr-xxj I C (S(N), 5>) = log , A 2 >2. 
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In the case where k > 1 (ideal amplifier), using (12.171), we have N' = 
k 2 (N + 1) — 1, D = k 2 (N + 1) - N and 

I C (S(N), <J>) = g(k 2 (N + 1) - 1) - g ((k 2 - 1) (N + 1)). 


Exercise 12.49. Show that I C (S(N), <J>) is a concave increasing function of N 
and therefore, 


sup I C (S(N), <J>) = lim I C (S(N), <J>) = log 
N N-too K — 1 


Summarizing, we get the general expression (12.184) for the quantum capacity of 
the attenuator/amplifier with N c = 0. □ 

The vast area of zero quantum capacity, containing the domain (12.167) of the 
entanglement-breaking channels, is given by the following proposition. 


Proposition 12.50. Let k > 


-4=. In this case, Q($>c,k) — ®f or 


N C > X - ( k 2 -\k 2 -\\) = min {k 2 , l}- 1 -. (12.189) 

Proof. Consider the concatenation $ = $ 2 o $i, where 


<J>?[JT(z)] = W(sflkz) exp 
®* 2 [W(z)\ = w[ 



(^ 2 + ^) (x 2 + y 2 ) , 


so that, by Proposition 12.47, Q(® 2 ) = 0 and hence Q (&2 ° ^i) = 0 by (10.35). 
Then 


<D* o $>%[W(z)] = W(kz) exp 


1 /|k 2 -l| 




where 


|k 2 -l| 


+ Nc 


= i(« 2+ i) + 


2 

(2k 2 - 1) 


+ N c ^j (x 2 + y 2 ) 

( Wl + i) 


By varying N\ , N 2 > 0, one gets all values N c that satisfy (12.189). □ 

In cases A) and D), the channel <J> is anti-degradable [105], and hence also 

Qm = 0 . 
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12.7 Notes and references 

1. The Spectral Theorem for self-adjoint operators, Stone’s Theorem and related 
questions are considered in detail in the book of Reed and Simon [171]. 

The quantum harmonic oscillator, first studied by Dirac [53], underlies many mathe¬ 
matical models in quantum optics and quantum electronics. In particular, useful sym¬ 
bolic calculus leading to algebraic identities following from CCR, such as (12.15), 
was developed in the book of Louisell [151]. An analysis based on coherent states 
was developed in the classical work of Glauber [69], Relation (12.22) is the famous 
P -representation, widely used in quantum optics. The representation of the radia¬ 
tion field as an ensemble of quantum oscillators, introduced by Dirac, is discussed in 
the books of Klauder and Sudarshan [136] and Helstrom [86], in a form suitable for 
applications in quantum optics. 

By using overcompleteness of the system of coherent vectors (Exercise 12.6)), one 
can define the unsharp observable with values in C = K 2 

= - f \tmd 2 S, (12.190) 

n Jb 

which plays a key role in the description of an “approximate joint measurement” of 
observables q and p. A detailed discussion and further references can be found in the 
books of Davies [48] and Holevo [107]. 

The formula for the capacity of a c-q Gaussian channel was conjectured by Gor¬ 
don [71]. In our proof, we follow [98], where one can also find a consideration of 
the multimode Gaussian c-q channel, including a realistic stochastic process model 
of the classical signal on the background of colored quantum noise. The capacity 
of the channel with squeezed Gaussian noise was computed by Holevo, Sohma, and 
Hirota in [112], Much more detailed information concerning the rate of convergence 
of the error probability for pure state Gaussian channels can be obtained by modify¬ 
ing the estimates from Section 5.7 to channels with infinite alphabets and constrained 
input [113]. 

2. The usefulness of the symmetric description (12.47) of CCR was stressed by Se¬ 
gal [181]. For the proof of the following theorem, see, e.g. [107]. 

3. Stone-von Neumann Uniqueness Theorem Let V x ,U y \ x,y e R s be two strong¬ 
ly continuous families of unitary operators in a separable Hilbert space M satisfying 
the Weyl CCR (12.44). In this case, V x , U y are unitarily equivalent to a direct sum 
of at most a countable number of copies of the Schrodinger representation arising 
from (12.43). In particular, any irreducible representation is unitarily equivalent to 
the Schrodinger representation. 
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The paper of Williamson [222] contains general results concerning normal forms of 
not necessarily positive quadratic Hamiltonians. For the proof of Lemma 12.12, see 
also Chapter V of the book [107], 

Many physical applications of the linear dynamics of systems with quadratic Hamil¬ 
tonians are considered in the book of Malkin and Man’ko [155]. A detailed review of 
the group of real symplectic transformations, with applications to quantum optics and 
mechanics, is given by Arvind, Dutta, Mukunda, and Simon [10]. 

4. In our exposition of quantum Gaussian states, we follow Chapter V of the 
book [107], where the proof of Exercises 12.15, 12.16, 12.19, and of sufficiency of 
the condition (12.76) can be found. In [107], one can also find an accurate treatment 
of the expectations of unbounded operators and moments of a quantum state. The 
approach to quantum Gaussian states via characteristic functions is based on apparent 
analogies with probability theory and is perhaps the most direct and transparent. An 
alternative way used in physical papers is via the Wigner distribution function, see, 
e.g. [195], 

The formulas for the entropy of a general quantum state were obtained in [112]. The 
maximum entropy characterization of the Gaussian states (Lemma 12.25) is a partic¬ 
ular case of the maximum entropy principle in statistical mechanics, see e.g. [151], 
Section 6.6. A similar property of the conditional quantum entropy (Lemma 12.26) 
was noticed by Eisert and Wolf [55]. 

The separability criterion for Gaussian states was obtained in the paper of Werner 
and Wolf [217]. This paper also contains important results concerning Gaussian PPT 
(positive partial transpose) states, in particular, a proof that in 1 x N bosonic modes, 
the PPT is sufficient for separability of Gaussians and the construction of a family of 
Gaussian PPT nonseparable (bound entangled) states in a 2 x 2 system. Purification 
of the Gaussian state was computed in [90]. 

5. The general description of bosonic linear and Gaussian channels was given in 
Holevo and Werner [114] (see also the survey [55]), following [92], By making a 
canonical decomposition for a general bosonic Gaussian channel studied in this sec¬ 
tion, it is possible to show that an arbitrary quantum optical communication circuit 
can be seen as composed of basic building blocks, comprising linear multiport in¬ 
terferometers and a few basic one-mode “ non-linear” (mathematically - non-gauge 
covariant) operations, such as parametric down-converters or squeezers [31]. 

On the other hand, the Gaussian channels are a special case of quasi-free maps of the 
CCR-algebra considered by Demoen, Vanheuverzwijn, and Verbeure [50], who had 
shown that nonnegative definiteness of the matrices (12.125) is necessary and suffi¬ 
cient for complete positivity of the maps 4>. In [50], the problem of unitary dilation 
of a quasi-free map was considered under a rather special restriction on the operator 
K. The proof of Theorem 12.30 contains the construction of the unitary dilation for 
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an arbitrary Gaussian channel, see Caruso, Eisert, Giovannetti, and Holevo [37] for 
more detail. 

Gaussian observables are essentially Gaussian q-c channels. They are considered in 
Chapter VI of the book [107]. In particular, it is shown that the Naimark dilation for 
the unsharp observable (12.190) is given by the joint spectral measure of commuting 
operators q + qc, P ~ PC, where the mode C is in the vacuum state. 

Gaussian entanglement-breaking channels were studied in the work of Holevo [106]. 

6. The principle “Gaussian channels have Gaussian optimizers” is well known in clas¬ 
sical information theory. The hypothesis of optimal Gaussian ensembles extrapolates 
this principle to quantum Gaussian states and channels, but only partial results are 
known in this direction. There are indications that the confirmation of this hypothesis 
should depend on the solution of the additivity problem for quantum Gaussian chan¬ 
nels. In the paper of Wolf, Giedke, and Cirac [226], it is shown that if the additivity 
holds, the average state S n of the optimal ensemble should be Gaussian. A closely 
related conjecture of the Gaussian minimizer for the entanglement of formation has a 
positive solution for symmetric Gaussian states in 1 x 1 composite bosonic system, as 
shown by Wolf et al. [227]. 

Theorem 12.36, proved in Holevo and Werner [114], is one instance of this principle 
for the mutual quantum information. Proposition 12.38, which expresses this principle 
for the quantum capacity of degradable Gaussian channels, is based on the observation 
of Wolf, Perez-Garcia and Giedke [228]. 

Proposition 12.40 is a multimode generalization of a result of Giovannetti, Guha, 
Lloyd, Maccone, Shapiro, and Yuen [67] (Corollary 12.46) for a pure loss channel. 

7. The classification of one-mode Gaussian channels was obtained by Holevo [106], 
see also [36]. A detailed study of their structure, based on explicitly found Kraus 
decompositions, is given in the paper of Ivan, Sabapathy, and Simon [122]. 

The capacities of the attenuation/amplification channel were considered by Holevo 
and Werner [114]. The upper bound (12.181) for the classical capacity of the attenu¬ 
ation channel was proposed by Konig and Smith [137], who also derived still better 
bounds based on a quantum generalization of the entropy power inequality in infor¬ 
mation theory. 

The fact that the quantum capacity is given by the Gaussian expression obtained 
in [114] was observed by Wolf, Perez-Garcia and Giedke [228]. The regions of zero 
quantum capacity were described in Caruso, Giovannetti, and Holevo [36]. Solutions 
to the Exercises from Section 12.6.4 and additional information on the channel with 
additive classical Gaussian noise are given in the paper [105], Smith, Smolin, and 
Yard [199] demonstrated superactivation of the quantum capacity for the pair of zero- 
capacity Gaussian channels: a one-mode attenuator with k 2 = 1/2 and an explicitly 
constructed two-mode PPT channel. 
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Numerous physical applications of continuous variable (in particular bosonic Gauss¬ 
ian) information processing systems, including experimental realizations, are dis¬ 
cussed in the surveys [32] and [214], 
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